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Abstract 

Planning  and  Resource  Allocation  (P/RA)  Human  Supervisory  Control  (HSC)  sys¬ 
tems  utilize  the  capabilities  of  both  human  operators  and  automated  planning  algorithms 
to  schedule  tasks  for  complex  systems.  In  these  systems,  the  human  operator  and  the  al¬ 
gorithm  work  collaboratively  to  generate  new  scheduling  plans,  each  providing  a  unique 
set  of  strengths  and  weaknesses.  A  systems  engineering  approach  to  the  design  and  as¬ 
sessment  of  these  P/RA  HSC  systems  requires  examining  each  of  these  aspects  individu¬ 
ally,  as  well  as  examining  the  performance  of  the  system  as  a  whole  in  accomplishing  its 
tasks.  An  obstacle  in  this  analysis  is  the  lack  of  a  standardized  testing  protocol  and  a 
standardized  set  of  metric  classes  that  define  HSC  system  perfonnance.  An  additional 
issue  is  the  lack  of  a  comparison  point  for  these  revolutionary  systems,  which  must  be 
validated  with  respect  to  current  operations  before  implementation. 

This  research  proposes  a  method  for  the  development  of  test  metrics  and  a  testing 
protocol  for  P/RA  HSC  systems.  A  representative  P/RA  HSC  system  designed  to  perform 
high-level  task  planning  for  deck  operations  on  United  States  Naval  aircraft  carriers  is 
utilized  in  this  testing  program.  Human  users  collaborate  with  the  planning  algorithm  to 
generate  new  schedules  for  aircraft  and  crewmembers  engaged  in  carrier  deck  operations. 
A  metric  class  hierarchy  is  developed  and  used  to  create  a  detailed  set  of  metrics  for  this 
system,  allowing  analysts  to  detect  variations  in  performance  between  different  planning 
configurations  and  to  depict  variations  in  performance  for  a  single  planner  across  levels 
of  environment  complexity.  In  order  to  validate  this  system,  these  metrics  are  applied  in  a 
testing  program  that  utilizes  three  different  planning  conditions,  with  a  focus  on  validat¬ 
ing  the  performance  of  the  combined  Human- Algorithm  planning  configuration. 

Experimental  result  analysis  revealed  that  the  experimental  protocol  was  successful  in 
providing  points  of  comparison  for  planners  within  a  given  scenario  while  also  being  able 
to  explain  the  root  causes  of  variations  in  performance  between  planning  conditions.  The 
testing  protocol  was  also  able  to  provide  a  description  of  relative  performance  across 
complexity  levels. 

The  results  demonstrate  that  the  combined  Human- Algorithm  planning  condition  per¬ 
formed  poorly  for  simple  and  complex  planning  conditions,  due  to  errors  in  the  recogni- 
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tion  of  a  transient  state  condition  and  in  modeling  the  effects  of  certain  actions,  respec¬ 
tively.  The  results  also  demonstrate  that  Human  planning  performance  was  relatively 
consistent  as  complexity  increased,  while  combined  Human- Algorithm  planning  was  ef¬ 
fective  only  in  moderate  complexity  levels.  Although  the  testing  protocol  used  for  these 
scenarios  and  this  planning  algorithm  was  effective,  several  limiting  factors  should  be 
considered.  Further  research  must  address  how  the  effectiveness  of  the  defined  metrics 
and  the  test  methodology  changes  as  different  types  of  planning  algorithms  are  utilized 
and  as  a  larger  number  of  human  test  subjects  are  incorporated. 
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1.  Introduction 


Sheridan  defined  Human  Supervisory  Control  (HSC)  systems  to  be  those  in  which 
“one  or  more  human  operators  are  intermittently  programming  and  continually  receiving 
information  from  a  computer  that  itself  closes  an  autonomous  control  loop  through  artifi¬ 
cial  effectors  to  the  controlled  process  or  task  environment"  [1],  While  Sheridan’s  origi¬ 
nal  work  considered  the  teleoperation  of  robots,  HSC  systems  can  also  include  systems 
that  utilize  automated  algorithms  to  schedule  task  assignments  or  perform  path  planning 
for  various  agents  [2-6].  These  will  be  referred  to  as  Planning  and  Resource  Allocation, 
or  P/RA,  HSC  systems,  a  model  of  which  is  provided  in  Figure  1  (adapted  from 
Sheridan’s  original  HSC  model  in  [1]). 


Human 

Supervisor 


Planning/Resource  Environment 

Allocation  Algorithm 


Figure  1.  Human  supervisory  control  diagram  for  P/RA  HSC  systems, 
modified  from  Sheridan  [1]. 


Within  a  P/RA  HSC  system,  the  human  operator  engages  a  planning  algorithm 
through  a  set  of  control  interfaces  in  order  to  create  a  feasible  plan  of  action.  Once  the 
plan  has  been  deemed  acceptable,  it  is  transmitted  to  and  implemented  by  the  agents  in 
the  environment.  The  planning  algorithm  then  monitors  the  execution  of  this  plan  via 
sensors  in  the  environment,  relaying  information  back  to  the  operator  through  a  set  of 
display  interfaces.  A  simple,  but  common  fonn  of  this  is  the  automobile  GPS  system,  in 
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which  drivers  input  a  destination  and  a  set  of  preferences  to  an  automated  planning  algo¬ 
rithm,  which  returns  a  suggested  driving  path  to  the  driver.  After  route  acceptance,  the 
system  then  continually  updates  the  driver  on  the  status  of  the  route,  sensed  through  a 
GPS  receiver,  and  relayed  through  a  visual  display  and  auditory  alerts. 

In  more  complex  planning  domains,  such  as  military  command  and  control  environ¬ 
ments  [7,  8],  the  value  of  P/RA  HSC  systems  lies  in  the  complementary  capabilities  of 
human  and  automated  planners.  Automated  planning  algorithms  are  capable  of  process¬ 
ing  and  incorporating  vast  amounts  of  incoming  information  into  their  solutions.  How¬ 
ever,  these  algorithms  are  brittle  and  unable  to  account  for  conditions  that  are  outside  the 
programmed  parameters,  especially  in  uncertain  environments.  [9].  Also,  despite  the 
speed  at  which  algorithms  can  process  information,  human  operators  retain  superiority  in 
pattern  recognition  and  the  ability  to  adapt  to  changing  conditions  [10,  11],  The  human 
ability  to  satisfice,  or  to  provide  feasible  solutions  that  only  address  a  subset  of  the  over¬ 
all  problem,  has  also  been  shown  to  be  highly  effective  [12,  13],  Recent  research  has 
shown  that  by  properly  allocating  functions  between  human  operators  and  automated  sys¬ 
tems,  perfonnance  superior  to  either  entity  alone  can  be  achieved  [4,  14,  15].  In  the  con¬ 
text  of  P/RA  HSC  systems,  human  planners  can  rely  on  their  experience  to  determine  the 
factors  most  important  to  system  perfonnance  (as  they  would  otherwise  do  when  satis¬ 
ficing).  Communicating  these  factors  aids  the  algorithm  in  the  development  of  a  local  so¬ 
lution  that  often  outperforms  the  solutions  generated  by  the  human  or  algorithm  individu¬ 
ally. 

The  design  of  P/RA  HSC  systems  requires  a  systems  engineering  approach,  which 
addresses  the  performance  of  both  the  human  operator  and  the  algorithm,  the  interactions 
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between  them,  and  their  ability  to  function  together  in  executing  system  tasks  [16].  This 
approach  stems  from  the  belief  that  successful  system  performance  is  a  product  of  both 
effective  component  design  and  effective  component  integration.  The  Mars  Climate  Or- 
biter  (MCO),  for  example,  was  destroyed  on  entry  into  the  Martian  atmosphere  due  to  a 
difference  in  measurement  units  between  two  subcomponents  [17].  Although  the  individ¬ 
ual  units  tested  properly,  the  error  in  unit  consistency  went  undetected,  resulting  in  a  total 
mission  loss.  While  the  MCO  case  is  an  extreme  result,  it  highlights  the  necessity  of 
evaluating  interactions  between  components  within  the  system.  Regardless  of  the  per¬ 
formance  of  the  human  operator  and  the  algorithm  within  a  P/RA  system,  if  the  two  can¬ 
not  effectively  communicate  in  order  to  execute  tasks,  the  overall  effectiveness  of  the 
system  will  likely  be  diminished. 

Viewing  this  from  a  systems  engineering  perspective,  several  models  provide  guid¬ 
ance  for  the  development  of  P/RA  HSC  systems.  Two  of  these  models,  the  Waterfall  [18] 
and  “V”  models  [19],  only  address  the  highest  level  of  process  task  definition  (e.g., 
Analysis  and  Design  in  the  waterfall  model).  This  thesis  will  focus  on  a  third  model,  the 
spiral  model  [20,  21],  which  divides  these  high  level  tasks  into  multiple  phases  of  plan¬ 
ning,  requirements  definition,  risk  analysis,  and  testing.  This  set  of  four  steps  is  continu¬ 
ally  repeated  throughout  the  process.  Figure  2  shows  a  spiral  model  for  the  development 
of  a  generic  software  system.  As  the  spiral  moves  outward,  the  design  process  moves 
from  lower  to  higher  levels  of  abstraction,  beginning  with  a  basic  definition  of  the  con¬ 
cept  of  operations  in  the  center  and  concluding  with  final  acceptance  testing  and  imple¬ 
mentation.  The  construction  of  the  spiral  model  also  provides  guidance  to  the  designer  as 
to  where  to  move  if  a  test  shows  deficient  system  performance.  For  example,  the  Integra- 
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tion  and  Test  stage  includes  the  final  assembly  of  system  components  and  tests  of  their 
ability  to  interact  effectively.  Should  this  test  fail,  the  engineering  process  should  likely 
return  to  the  Design  Validation  and  Verification  stage  to  adjust  component  design  pa¬ 
rameters,  or  to  the  Integration  and  Test  Plan  stage  if  the  method  of  component  interfac¬ 
ing  requires  alteration. 


CUMMULATIVE 

COST 


DETERMINE 

OBJECTIVES. 

ALTERNATIVES. 

CONSTRAINTS 


COMMITMENT 

PARTITION 


Figure  2.  Systems  engineering  spiral  model  as  adapted  to  software 

engineering  [20,  21]. 


The  spiral  model  in  Figure  2  is  used  as  a  basis  for  discussion  throughout  the  remain¬ 
der  of  this  thesis.  Specifically,  this  thesis  will  address  the  two  final  test  steps,  highlighted 
in  grey  in  Figure  2  -  the  Integration  and  Test  and  Acceptance  Test  stages.  The  former 
addresses  the  effectiveness  with  which  the  human  operator  and  the  algorithm  interact 
within  the  system,  while  the  latter  addresses  the  ability  of  the  combined  system  to  effec¬ 
tively  perform  tasks  in  the  environment.  This  thesis  will  address  the  development  of 
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measurement  metrics  and  a  testing  protocol  for  evaluating  the  performance  of  P/RA  HSC 
systems  in  these  two  test  steps,  addressing  both  the  human  and  algorithmic  components 
of  the  system.  The  testing  protocol  and  measurement  metrics  should  also  be  generalizable 
to  a  wide  range  of  P/RA  HSC  system  domains  and  algorithm  formats.  The  metrics  devel¬ 
oped  for  this  protocol  are  both  descriptive  and  diagnostic,  providing  empirical  compari¬ 
son  points  between  systems  while  also  identifying  the  properties  of  a  single  system  that 
led  to  its  efficiency  (or  inefficiency). 

1.1.  Problem  Statement 

A  systems  engineering  approach  to  the  evaluation  of  P/RA  HSC  systems  requires  a 
holistic,  comprehensive  testing  protocol.  An  obstacle  to  the  creation  of  this  protocol  is  a 
lack  of  both  standardized  metrics  and  a  standardized  methodology  of  metric  definition. 
While  standardized  metrics  and  frameworks  exist  for  defining  the  performance  of  both 
human  operators  [22,  23]  and  automated  planning  algorithms  [24,  25],  no  standardized 
frameworks  are  currently  in  place  for  the  interaction  between  humans  and  automated 
P/RA  systems,  or  for  system  (mission)  performance  overall. 


System 

No  standards 


Figure  3.  Existence  of  standardized  metrics  for  HSC  systems. 
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The  goal  of  this  thesis  is  to  understand  how  a  set  of  metrics  should  be  defined  for  and 
applied  to  a  P/RA  HSC  system,  and  how  an  analysis  of  the  resulting  data  can  provide  in¬ 
sight  into  the  strengths  and  weaknesses  of  a  human-automation  collaborative  system.  A 
metric  class  hierarchy  from  prior  literature  [26,  27]  is  used  to  guide  the  creation  of  met¬ 
rics  for  a  representative  P/RA  HSC  system,  the  Deck  operations  Course  of  Action  Plan¬ 
ner  (DCAP).  The  DCAP  system  utilizes  an  automated  scheduling  algorithm  to  aid  opera¬ 
tors  in  replanning  tasks  in  the  aircraft  carrier  deck  environment,  which  is  generalizable  to 
a  large  number  of  planning  and  resource  allocation  HSC  systems.  Metrics  are  defined  for 
this  system  and  utilized  in  an  experimental  simulation  testbed  that  examines  performance 
over  varying  complexity  levels.  The  discussion  of  these  testing  results  addresses  both  the 
comparison  of  system  performance  within  each  testing  scenario,  as  well  as  the  perform¬ 
ance  of  the  systems  across  complexity  levels.  The  next  section  of  this  chapter  details  the 
specific  research  questions  that  will  be  addressed  in  this  thesis. 

1.2.  Research  Questions 

This  thesis  addresses  three  specific  questions: 

1.  What  metrics  are  required  for  the  evaluation  of  a  Planning  and  Resource  Allocation 
Human  Supervisory  Control  system  as  compared  to  manual  planning? 

2.  How  can  these  metrics  assess  the  variations  in  performance  of  human  and  combined 
human-algorithm  planning  agents? 

3.  How  can  these  metrics  predict  system  feasibility  and  highlight  possible  design  inter¬ 
ventions? 
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1.3. 


Thesis  Overview 


This  thesis  is  organized  into  six  chapters.  Chapter  1,  Introduction,  describes  the  moti¬ 
vation  and  research  objectives  for  this  thesis.  Chapter  2,  Prior  Work,  details  prior  re¬ 
search  concerning  planning  and  resource  allocation  algorithms  and  the  creation  of  metrics 
for  human  perfonnance,  automated  algorithm  perfonnance,  and  the  interaction  between 
these  elements.  Chapter  3,  the  Deck  Operations  Course  of  Action  Planner  (DCAP),  ex¬ 
plains  the  features  of  the  DCAP  system  and  its  embedded  automated  algorithm.  Chapter 
4,  Performance  Validation  Testing,  describes  the  creation  of  the  metrics  used  in  the 
analysis  of  the  DCAP  system,  the  testing  scenarios,  the  creation  of  a  set  of  operator  plan¬ 
ning  heuristics,  and  subsequent  testing  of  the  system.  Chapter  5,  Results  and  Discussion, 
details  the  results  of  the  application  of  the  defined  metrics  to  the  resultant  simulation  data 
and  the  infonnation  gained  from  this  process.  Chapter  6,  Conclusions  and  Future  Work, 
reviews  the  contributions  of  this  research  in  regards  to  the  defined  research  questions  and 
also  addresses  future  research  questions. 
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2.  Prior  Work 


This  chapter  provides  a  review  of  metrics  previously  used  in  validating  the  perform¬ 
ance  of  planning  algorithms  and  HSC  systems  (including  both  P/RA  and  more  generic 
HSC  systems).  The  first  section  in  this  chapter  presents  a  framework  for  HSC  metric 
classification  taken  from  prior  literature.  This  framework  is  used  as  an  organizational  tool 
for  the  remaining  sections  of  the  chapter,  which  provide  details  on  the  specific  types  of 
metrics  utilized  in  prior  literature. 

2.1.  The  HSC  Metric  Hierarchy 

Several  non- standardized  metric  class  hierarchies  have  been  developed  for  HSC  sys¬ 
tems  [28-31],  Metrics  can  be  differentiated  into  classes  according  to  their  attributes,  pri¬ 
marily  in  terms  of  the  object  of  the  application.  P/RA  HSC  metrics  can  be  separated  into 
classes  for  Human,  Automation,  and  Mission  perfonnance  as  well  as  Human- Automation 
Interaction.  Mission  Performance  metrics  describe  the  ability  of  the  system,  as  a  whole, 
to  accomplish  its  goals  in  the  environment.  Automation  Perfonnance  describes  the  ability 
of  the  automated  components  -  such  as  automated  algorithms  or  sensors  -  to  perform 
their  specific  tasks.  Measures  of  Human-Automation  Interaction  typically  describe  the 
active  processes  the  operator  uses  to  input  commands  to  or  acquire  infonnation  from  the 
system  (e.g.  mouse  clicks),  while  Human  Performance  measures  typically  describe  fea¬ 
tures  native  to  the  human  operator  (such  as  fatigue  or  stress).  Table  1  provides  a  brief 
representation  of  four  prior  metric  hierarchies  according  to  this  metric  class  structure. 
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Table  1.  Metric  classes  from  prior  work. 


Human 

Automation 

Human- 

Automation 

Interaction 

Mission 

Performance 

Performance 

Performance 

Olsen  and  Goodrich  [28] 

X 

X 

X 

Steinfeld  et  al.  [29] 

X 

X 

X 

X 

Crandall  and 
Cummings  [30] 

X 

X 

X 

Scholtz  [31] 

X 

X 

Olsen  and  Goodrich’s  hierarchy  [28]  focused  almost  exclusively  on  quantifying  robot 
performance,  with  most  metric  classes  excluding  measures  for  the  human  operator;  those 
that  considered  human  operators  examined  only  their  interaction  with  the  autonomous 
platform.  Steinfeld  et  al.  [29]  included  classes  addressing  both  the  human  and  automated 
aspects  of  the  system,  as  well  as  the  effectiveness  of  the  overall  system.  However,  the 
class  of  human  performance  metrics  did  not  differentiate  between  human-automation  in¬ 
teraction  and  individual  human  performance  measures.  Steinfeld’s  hierarchy  also  lacked 
depth  in  the  definitions  for  human  and  system  perfonnance  (only  three  metrics  appear  in 
each  category),  but  did  provide  numerous  metrics  for  automation  performance. 

Crandall  and  Cummings  [30]  created  metrics  for  single-robot  and  multi-robot  systems 
and  created  additional  measures  addressing  human  interaction.  This  hierarchy  did  not  in¬ 
clude  direct  measures  of  system  performance,  although  it  did  provide  a  differentiation 
between  human  performance  and  human-automation  interaction.  Scholtz’s  [31]  hierarchy 
addressed  measures  dealing  primarily  with  the  human  operator  and  their  interaction  with 
the  system,  but  lacked  metrics  for  system  perfonnance  and  automation  performance. 

While  each  of  these  four  hierarchies  is  lacking  in  some  manner,  they  combine  to  ad¬ 
dress  each  of  the  aspects  of  P/RA  HSC  systems  as  depicted  in  Figure  3.  However,  as  a 
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whole,  only  the  automation  performance  class  contains  a  large  number  and  variety  of  ex¬ 
ample  metrics;  the  remaining  categories  only  include  few,  if  any,  examples.  These  defi¬ 
ciencies  were  also  noted  by  Pina  et  al.  [26,  27],  who  incorporated  the  work  of  these  (and 
other)  authors  in  creating  an  expanded  and  detailed  categorical  structure  for  HSC  metrics. 
The  five  main  categories  developed  from  this  work,  with  additional  subcategories,  are 
shown  in  Table  2. 


Table  2.  Pina  et  al.'s  [26,  27]  metric  classes  and  subclasses. 


Mission 

Efficiency 

Autonomous 

Platform 

Behavior 

Efficiency 

Human 

Behavior 

Efficiency 

Human 

Behavior 

Precursors 

Collaborative 

Metrics 

•  Time  based 

•  Error  based 

•  Coverage 
based 

•  Adequacy 

•  Autonomy 

•  Usability 

•  Self- 

awareness 

•  Attention 
Allocation 
Efficiency 

•  Information 
Processing 
Efficiency 

•  Cognitive 
Precursors 

•  Physical 
Precursors 

•  Between 
Humans 

•  Between 
Autonomous 
systems 

•  Between 

Human  and 
Automation 

Pina  et  al.'s  [26,  27]  five  metric  classes  consider  each  of  the  various  aspects  of  a  gen¬ 
eral  HSC  system  and  encompass  much  of  the  previous  metric  hierarchies,  while  also  pro¬ 
viding  additional  detail  in  the  definition  of  subclasses  for  each  category.  Pina  et  al.  also 
include  an  additional  class  of  Collaborative  measures,  which  is  an  additional  perspective 
on  human-automation  interaction.  Collaborative  measures  address  the  sociological  as¬ 
pects  of  the  system,  considering  the  automation  to  be  a  member  of  the  “team”  of  opera¬ 
tors  performing  system  tasks.  These  measures  may  address  the  effectiveness  of  collabora¬ 
tion  between  multiple  human  operators,  multiple  automated  agents,  or  between  human 
and  automation. 
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Mission  Efficiency  metrics  measure  the  performance  of  the  system  as  a  whole  as  it 
performs  tasks  within  its  domain  -  a  critical  issue  in  the  Acceptance  Test  stage  in  the  spi¬ 
ral  model.  The  remaining  categories  address  the  performance  of  individual  subcompo¬ 
nents  and  their  efficiency  of  interaction,  supporting  the  Integration  and  Test  stage  in  the 
spiral  model.  Autonomous  Platform  Behavior  Efficiency  contains  measures  for  the  effec¬ 
tiveness  of  an  algorithm  in  its  computations  and  its  capability  to  support  the  human  op¬ 
erator  in  his/her  tasks.  Human  Behavior  Efficiency  measures  address  the  performance  of 
the  human  operator  as  he  or  she  engages  the  system  through  both  cognitive  (information 
extraction)  and  physical  (command  input)  means.  Human  Behavior  Precursor  metrics  ex¬ 
amine  the  endogenous  factors  that  affect  human  interactions  (such  as  physical  and  mental 
fatigue  or  operator  situational  awareness).  The  final  category,  Collaborative  Metrics,  ad¬ 
dresses  the  degree  to  which  human  users  and  automated  agents  are  able  to  work  together 
to  accomplish  tasks.  Figure  4  highlights  how  Pina  et  alls  classes  of  metrics  apply  to  the 
P/RA  HSC  metric  hierarchy  originally  shown  in  Figure  1. 


Figure  4.  HSC  diagram  highlighting  Pina's  metric  classes 
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The  metrics  included  in  these  categories  can  fulfill  both  descriptive  and  diagnostic 
roles.  All  metrics  are  descriptive  with  respect  to  some  aspect  of  the  system.  For  Plan¬ 
ning/Resource  Allocation  systems,  descriptive  metrics  document  the  objective  perform¬ 
ance  of  the  system  and  its  subcomponents  (the  human  operator  and  algorithm).  For  a  path 
planning  system,  a  descriptive  Mission  Perfonnance  measure  may  address  the  total  travel 
time  on  the  path  or  the  cost  of  the  path  (e.g.  total  work).  Descriptive  measures  for  the  al¬ 
gorithm  may  address  the  total  time  required  to  replan  or  make  take  the  form  of  a  scoring 
function  applied  to  the  solution.  A  descriptive  measure  for  the  human  operator  may  in¬ 
clude  a  rating  of  their  situational  awareness  or  trust  in  the  system. 

These  same  measures  can  also  be  used  in  a  diagnostic  manner,  explaining  the  per¬ 
formance  of  other  metrics.  While  total  mission  completion  time  is  a  widely  used  descrip¬ 
tive  measure,  it  has  no  ability  to  explain  the  conditions  within  the  environment  that  lead 
to  its  final  value.  This  can  only  be  revealed  by  additional  diagnostic  metrics  that  illumi¬ 
nate  specific  details  of  the  perfonnance  of  the  system.  For  instance,  a  metric  noting  that 
the  human  operator  required  more  time  to  execute  replanning  tasks  may  provide  one  ex¬ 
planation  for  high  values  of  mission  completion  time.  Metrics  demonstrating  that  the  sys¬ 
tem  performed  poorly  on  a  single  mission  subtask  may  provide  an  alternate  explanation 
for  this  same  factor.  Additionally,  a  second  round  of  diagnostic  measures  can  be  applied 
to  each  of  these  cases  in  order  to  determine  why  the  human  operator  required  more  time 
to  replan  (longer  time  to  perform  a  replanning  subtask)  or  why  the  mission  subtask  re¬ 
quired  more  time  (deadlock  in  path  planning  or  unnoticed  failure).  This  can  continue  it¬ 
eratively  until  a  definite  root  cause  explanation  is  obtained. 


31 


The  ultimate  goal  of  the  system  is  to  provide  maximum  effectiveness  in  performing 
tasks  in  the  environment,  which  is  revealed  primarily  by  measures  of  Mission  Efficiency. 
However,  in  cases  where  poor  mission  performance  is  seen,  descriptive  measures  may 
not  effectively  identify  the  mechanisms  leading  to  problems.  Thus,  a  combination  of  met¬ 
rics  addressing  each  of  these  factors  -  the  mission,  human  operator,  algorithm,  and  hu¬ 
man-automation  interaction  classes-  is  needed  to  provide  a  full  analysis  of  the  system 
[26,  27].  The  remaining  sections  of  this  chapter  will  address  measures  for  each  of  these 
aspects  individually  as  they  relate  to  P/RA  HSC  systems. 

2.2.  Metrics  for  Mission  Efficiency 

Measures  of  Mission  Efficiency  address  the  ability  for  the  complete  HSC  system  to 
perform  tasks  in  the  world,  and  exact  definitions  for  these  measures  depend  on  the  envi¬ 
ronment  in  which  the  HSC  system  acts.  Pina  et  al.  [26,  27]  differentiated  measures  of 
Mission  Efficiency  measures  into  Error-based,  Time-based,  and  Coverage-based  meas¬ 
ures.  This  section  will  address  these  measures  and  provide  examples  used  in  prior  studies, 
focusing  on  those  used  in  P/RA  systems. 

Error  measures  identify  the  number  of  errors  that  occur  in  the  execution  of  the  P/RA 
system  solution.  Errors  can  be  attributed  to  either  the  human  or  the  automation  perform¬ 
ing  inappropriate  actions  (errors  of  commission)  or  not  fulfilling  desired  objectives  (er¬ 
rors  of  omission).  For  a  path  planning  P/RA  system,  the  returned  solution  may  be  a  path 
that  avoids  specific  areas  (such  as  threat  zones)  while  minimizing  costs  or  collisions  [4, 
32,  33].  Error  measures  for  such  a  path  planning  system  may  track  how  many  collisions 
occur  or  how  much  threat  is  accrued  by  flying  into  unacceptable  areas  (both  are  errors  of 
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commission).  In  other  cases,  the  inability  to  address  tasks  within  a  given  time  window 
[34,  35]  can  be  considered  as  errors  of  omission  (failing  to  perfonn  certain  tasks).  These 
measures  are  descriptive  and  diagnostic  only  with  respect  to  the  performance  of  the  re¬ 
turned  solution.  The  identification  of  the  specific  actions  on  the  part  of  the  human  opera¬ 
tor  or  the  algorithm  that  lead  to  this  performance  can  appear  in  other  metric  classes. 

Time-based  measures  include  all  temporal  measures,  primarily  addressing  the  total 
time  of  solution  execution  (the  mission  time  or  mission  duration)  [36-40].  By  definition, 
however,  these  are  limited  to  systems  with  a  temporal  component.  For  P/RA  HSC  sys¬ 
tems  that  perfonn  time-independent  task  allocations,  these  measures  may  not  be  impor¬ 
tant. 

Coverage-based  metrics  can  also  be  included  in  some  cases  [3,  41,  42],  A  common 
application  of  military  P/RA  planning  systems  is  in  target  destruction  tasks,  where  an  al¬ 
gorithm  supports  a  human  operator  in  identifying,  tracking,  and  destroying  hostile  targets. 
In  some  cases,  the  number  of  targets  may  outnumber  the  available  resources,  making  the 
destruction  of  every  target  impossible.  The  percentage  of  total  targets  destroyed  can  be 
used  as  a  measure  of  overall  performance  for  the  system  [41].  Additionally,  a  measure  of 
missiles  fired  per  enemy  target  destroyed  is  descriptive  in  terms  of  the  actions  of  the  al¬ 
gorithm  but  may  also  be  diagnostic  in  revealing  the  efficiency  of  system  actions.  In  this 
case,  high  values  of  missiles  fired  per  target  destroyed  can  explain  poor  overall  perform¬ 
ance  (e.g.,  the  system  did  not  effectively  utilize  its  resources  or  the  missiles  had  difficulty 
in  reaching  and  destroying  targets)  [42]. 
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These  Mission  Efficiency  measures  examine  the  effectiveness  of  the  generated  solu¬ 
tion  in  light  of  the  system  objectives,  but  they  are  not  the  sole  indicator  of  a  well¬ 
performing  system  [16].  These  measures  are,  however,  the  primary  descriptive  metrics  of 
the  system  and  are  often  the  primary  criterion  on  which  system  implementation  is  based. 
These  measures  are  also  affected  by  the  individual  performance  of  the  human  operator 
and  the  algorithm  and  the  quality  of  interaction  between  the  two,  each  of  which  must  be 
considered  in  the  course  of  system  evaluation.  The  next  section  will  address  one  of  these 
aspects  -  the  ability  of  the  algorithm  to  support  the  mission,  described  by  measures  of 
Autonomous  Platfonn  Behavior  Efficiency. 

2.3.  Metrics  for  Autonomous  Platform  Behavior  Effi¬ 
ciency 

In  P/RA  HSC  systems,  an  algorithm  does  not  necessarily  exist  as  an  independent 
agent  and  may  interact  with  a  human  operator  in  order  to  perform  system  tasks.  In  this 
regard,  the  algorithm  must  perform  adequately  within  the  system  and  provide  sufficient 
support  for  mission  operations.  The  ability  of  the  algorithm  and  the  associated  interface 
to  accomplish  these  objectives  was  included  as  part  of  Autonomous  Platform  Behavior 
Efficiency  in  Pina  et  aV s  metric  hierarchy  [26,  27].  However,  in  selecting  metrics  that 
define  the  performance  of  the  automated  algorithm,  the  type  of  algorithm  and  the  domain 
of  application  will  determine  the  number  of  applicable  metrics.  Before  providing  exam¬ 
ples  of  Autonomous  Platfonn  Behavior  Efficiency,  this  section  will  review  common 
types  of  algorithms  used  in  P/RA  HSC  systems  and  the  domains  to  which  they  are  ap¬ 
plied. 
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2.3.1.  Common  Algorithms  for  Planning  and  Resource  Allocation 


Several  different  forms  of  planning  algorithms  have  been  proposed  for,  or  imple¬ 
mented  in  P/RA  HSC  systems  [8,  15,  32,  33,  36,  40,  43-45].  However,  these  various  al¬ 
gorithms  can  be  grouped  into  three  categories  based  upon  the  assumptions  made  and  ac¬ 
tions  taken  by  the  algorithms.  These  three  classes  are  Deterministic,  Probabilistic,  and 
Heuristic  algorithms  [46,  47].  Table  3  provides  a  brief  comparison  of  these  algorithm 
classes. 


Table  3.  Comparison  of  Algorithm  Classes. 


Deterministic 

Algorithms 

Probabilistic 

Algorithms 

Heuristic 

Algorithms 

Assumed  knowledge 
of  the  world 

Complete 

Incomplete 

Incomplete 

Decision  basis 

Fully  defined  cost 
functions  and 
constraints 

Probability  density 
functions 

Heuristic  rules 

Examples 

MILPs,  Dynamic 
Programming, 
Clustering 

MDPs,  Kalman/particle 
filtering,  RRTs 

Tabu  search 

Hill-climbing 

algorithms 

Deterministic  algorithms  utilize  explicit  cost  functions  and  constraint  models  to  per¬ 
form  an  exhaustive  search  of  the  domain.  For  correct  solutions,  these  algorithms  require 
access  to  all  information  relating  to  these  cost  and  constraint  models.  If  this  information 
is  accessible,  these  algorithms  will  return  an  optimal  solution  (if  one  exists).  This  class 
includes  Mixed-Integer  Linear  Programs  [32,  44,  45],  graph  exploration  algorithms  such 
as  Breadth-  and  Depth-first  search,  potential  fields  [36,  40],  and  many  other  forms  [8,  15, 
33,43]. 
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In  contrast  with  detenninistic  algorithms,  probabilistic  algorithms  [48-51],  such  as 
Markov  Decision  Processes  (MDPs),  assume  incomplete  knowledge  about  the  world  and 
calculate  responses  based  on  probability  models.  Kalman  and  particle  filters  also  fall 
within  this  class,  but  instead  use  mathematical  filtering  techniques  to  reduce  the  level  of 
incompleteness  of  information.  For  instance,  systems  perfonning  target  tracking  may  not 
know  the  exact  location  of  the  target  to  be  tracked,  but  may  be  able  to  build  a  probability 
density  function  describing  the  likelihood  of  the  target’s  location  on  the  map  [49]. 

Heuristic  algorithms  [37,  41,  52-59]  also  assume  incomplete  information  about  the 
world,  but  do  not  rely  on  probabilities  in  order  to  make  choices.  Instead,  heuristic  algo¬ 
rithms  rely  on  a  set  of  heuristics  -  “rules  of  thumb”  [60]  -  to  calculate  responses.  For 
these  cases,  the  exact  equation  describing  optimality  may  not  be  known,  due  to  either 
problem  complexity  or  to  the  inability  to  model  certain  constraints  accurately.  In  this 
case,  a  heuristic  function,  such  as  a  scoring  metric  [61,  62],  can  be  used  to  judge  the  rela¬ 
tive  perfonnance  of  the  system.  This  class  of  algorithms  includes  Tabu  search  [41,  52-54] 
and  hill-climbing  algorithms  [61,  62],  as  well  as  several  other  forms  [37,  55-59]. 

The  amount  and  types  of  data  available  from  the  environment  can  influence  the 
choice  of  algorithm  for  the  system.  Deterministic  algorithms  typically  require  complete 
data  on  the  state  of  the  world.  For  a  tracking  task,  the  algorithm  requires  precise  informa¬ 
tion  on  the  terrain,  the  current  position  of  the  target,  and  information  about  the  current 
state  of  tracking  vehicle  and  its  capabilities.  Probabilistic  algorithms  can  compensate  for 
cases  where  the  system  does  not  have  precise  infonnation  on  the  location  of  the  target,  as 
noted  above.  The  system’s  solution  could  be  optimal  on  average  but  may  not  be  optimal 

for  any  one  case.  Heuristic  algorithms  can  be  used  when  the  characteristics  of  an  optimal 
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solution  or  the  world  are  not  known,  or  if  the  problem  space  is  complex  enough  that 
complete,  feasible  solutions  are  not  expected.  In  this  case,  the  system  iterates  through 
possible  solutions,  selecting  candidates  for  the  next  iteration  based  on  a  set  of  basic  rules. 
These  algorithms  do  not  guarantee  optimality,  and  instead  seek  solutions  that  reach  a  cer¬ 
tain  threshold  of  performance  -  a  concept  known  as  “satisficing”  [63]. 

By  definition,  HSC  systems  require  the  presence  and  interaction  of  human  operators 
in  order  to  accomplish  tasks.  The  required  level  of  interaction  may  vary  as  described  by 
Sheridan’s  ten  levels  of  automation,  listed  in  Figure  5. 


Automation  Level 

Automation  Description 

1 

The  computer  offers  no  assistance 

2 

The  computer  offers  a  complete  set  of  decision/action  alternatives,  or 

3 

Narrows  the  selection  down  to  a  few,  or 

4 

Suggests  one  alternative,  and 

5 

Executes  that  suggestion  if  the  human  approves,  or 

6 

Allows  the  human  a  restricted  time  to  veto  before  automatic  execution,  or 

7 

Executes  automatically,  then  informs  humans,  and 

8 

Informs  the  human  only  if  asked,  or 

9 

Informs  the  human  only  if  it,  the  computer,  decides  to. 

1 0  The  computer  decides  everything  and  acts  autonomously,  ignoring  the  human 

Figure  5.  Sheridan  and  Verplank's  Ten  Levels  of  Automation  (adapted  from  [64]). 

In  one  extreme  (Level  1),  the  human  operator  performs  all  tasks  without  any  aid  from 
the  automated  system.  At  the  other  extreme  (Level  10),  the  automated  system  performs 
all  tasks,  requiring  no  assistance  from  (and  offering  no  notifications  to)  the  human  opera¬ 
tor.  The  remaining  eight  levels  comprise  the  majority  of  Human  Supervisory  Control  sys¬ 
tems,  with  operator  workload  and  input  gradually  decreasing  as  level  increases.  For  the 
case  of  P/RA  systems,  many  exist  in  the  range  between  Level  3  (the  automated  system 
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provides  several  options)  and  Level  6  (the  system  executes  a  suggested  solution  unless 
vetoed  by  human  operator).  In  each  of  these  cases,  the  human  operator  and  automated 
system  work  collaboratively  in  order  to  perform  a  shared  task,  with  the  operator  possibly 
providing  suggestions  to  the  system  or  selecting  (or  vetoing)  one  of  several  suggested 
solutions.  In  this  context,  the  performance  of  the  P/RA  algorithm  requires  measuring  the 
ability  of  the  algorithm  to  support  these  collaborative  mission-replanning  tasks.  These 
measures  of  Automation  Platform  Behavior  Efficiency  are  discussed  in  the  next  subsec¬ 
tion. 


2.3.2.  Measures  of  Autonomous  Platform  Behavior  Efficiency 

Pina  et  al.  divided  measures  for  Autonomous  Platform  Behavior  Efficiency  into  four 
categories  (Table  2).  Measures  of  Adequacy  address  the  ability  of  the  algorithm  to  sup¬ 
port  the  mission  computationally,  focusing  on  the  accuracy  and  reliability  of  the  algo¬ 
rithm.  Originally,  this  category  only  considered  qualitative  measures  for  the  entire  HSC 
system  and  did  not  differentiate  between  the  interface  and  the  algorithm.  For  P/RA  sys¬ 
tems,  this  can  be  expanded  to  include  traditional  algorithm  performance  measures  such  as 
runtime  and  error  (infeasibility/incompleteness)  rates  (see  [24,  25,  65-70]  for  examples). 
Within  P/RA  systems,  ideal  algorithms  would  be  both  highly  reliable  and  highly  accurate, 
accepting  user  inputs  and  creating  valid  plans  with  no  errors. 

In  the  context  of  P/RA  systems,  measures  of  Usability  address  the  human  operator’s 
subjective  opinions  of  the  algorithm’s  ability  to  support  the  mission,  as  well  as  the  ability 
of  the  user  to  understand  how  to  interact  with  the  algorithm  through  the  display  and  con¬ 
trol  interface.  This  may  involve  a  subjective  evaluation  of  the  system  by  the  user  [37,  71] 
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or  asking  the  user  to  explain  their  thought  processes  during  interaction  [72,  73].  In  other 
cases,  these  qualitative  measures  have  been  operationalized  into  quantitative  measures 
[74-77],  such  as  tracking  the  order  and  duration  of  user’s  interactions  with  the  system  and 
comparing  these  to  the  actions  of  an  expert  user  [74-76].  With  regards  to  P/RA  systems, 
these  measures  may  also  ask  the  user  to  evaluate  the  performance  of  and  their  ability  to 
understand  the  algorithm  through  surveys  [3]  or  other  means.  The  goal  of  these  measures 
is  to  determine  the  ease  of  use  of  the  algorithm  -  that  the  user  is  able  to  understand  its 
actions,  able  to  predict  its  performance,  and  understand  how  to  appropriately  interact  with 
the  algorithm. 

Autonomy  refers  to  how  well  the  system  is  able  to  function  without  operator  interac¬ 
tion.  This  measure  has  its  roots  in  Human  Robot  Interaction  (HRI)  research  [6,  28,  30,  35, 
78-80],  where  the  perfonnance  of  one  or  more  robots  can  degrade  over  time.  This  re¬ 
sulted  in  measures  of  “neglect”  [28,  78]  that  judged  how  long  the  system  could  maintain 
performance  above  a  certain  threshold  without  operator  input.  This  may  not  be  an  appli¬ 
cable  measure  for  every  P/RA  algorithm.  If  the  current  plan  or  schedule  does  not  degrade 
in  perfonnance  until  an  exogenous  event  occurs,  and  the  algorithm  requires  human  inter¬ 
action  in  order  to  replan,  neglect  tolerance  is  dependent  on  the  occurrence  of  the  exoge¬ 
nous  event  and  not  on  the  algorithm.  For  all  systems  that  do  not  exhibit  these  two  condi¬ 
tions,  typical  measures  of  neglect  tolerance  apply. 

Self-awareness  is  primarily  intended  for  autonomous  systems  that  can  independently 

self-monitor  and  self-diagnose  their  performance.  In  the  context  of  P/RA  systems,  this 

would  involve  the  ability  of  the  algorithm  to  relate  the  actual  performance  of  its  solutions 

to  the  predicted  perfonnance.  This  could  potentially  lead  to  the  algorithm  adjusting  its 
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parameters  as  time  continues  (known  as  “on-line”  learning;  see  [81]).  For  a  P/RA  system, 
the  inclusion  of  this  ability  would  be  beneficial  to  the  embedded  algorithms.  However, 
this  is  still  an  area  of  emerging  research  [82-84]  and  is  beyond  the  scope  of  this  present 
work. 

Metrics  within  the  class  of  Autonomous  Platform  Behavior  Efficiency  address  the 
performance  of  the  automated  algorithm  and  its  ability  to  support  the  P/RA  HSC  system 
in  its  tasks.  While  these  measures  address  the  planning  algorithm  and  its  responses  to 
human  input,  they  do  not  address  the  perfonnance  of  the  human  operator  as  he  or  she  in¬ 
teracts  with  the  system.  These  measures  for  human  perfonnance  are  divided  into  catego¬ 
ries  of  Human  Behavior  Efficiency  and  Human  Behavior  Precursors  and  are  addressed  in 
the  next  section  of  this  chapter. 

2.4.  Metrics  for  Human  Performance 

Human  performance  can  be  characterized  along  two  dimensions.  Measures  for  Hu¬ 
man  Behavior  Efficiency  address  the  effectiveness  with  which  the  human  operator  ac¬ 
tively  engages  the  system  through  the  interface  elements,  acquiring  infonnation  and  exe¬ 
cuting  tasks.  Human  Behavior  Precursors  address  how  the  efficiency  of  this  interaction  is 
affected  by  certain  endogenous  factors  inherent  to  the  human  operator.  These  precursors 
simultaneously  both  influence  and  are  influenced  by  Human  Behavior  Efficiency.  Each 
of  these  factors  will  be  discussed  individually  in  the  following  subsections,  beginning 
with  Human  Behavior  Efficiency. 


40 


2.4.1.  Metrics  for  Human  Behavior  Efficiency 


For  P/RA  HSC  systems,  physical  interaction  with  the  algorithm  is  mediated  by  a  dis¬ 
play  interface.  Thus,  measures  of  interaction  efficiency  address  the  ease  with  which  the 
user  assimilates  infonnation  from  and  executes  control  inputs  through  the  system  inter¬ 
face.  Pina  et  al.  divided  these  measures  into  two  separate  categories  -  measures  for  At¬ 
tention  Allocation  and  Information  Processing. 

The  ability  of  the  user  to  successfully  distribute  their  attention  across  multiple  com¬ 
peting  tasks  in  order  to  assimilate  incoming  information  is  referred  to  as  Attention  Allo¬ 
cation  Efficiency  (AAE).  For  P/RA  systems,  this  involves  the  operator’s  ability  to  moni¬ 
tor  the  overall  state  of  the  world,  to  monitor  the  state  of  individual  priority  tasks,  to  be 
aware  of  failure  and  status  messages,  and  to  execute  replanning  actions.  Measuring  the 
operator’s  efficiency  in  doing  these  tasks  requires  understanding  what  aspects  of  the  in¬ 
terface  the  user  is  interacting  with  at  a  given  time  and  with  what  purpose.  These  measures 
can  be  taken  by  tracking  mouse  cursor  location  [77],  eye-tracking  [85,  86]  or  through 
verbal  reporting  on  the  part  of  the  user  [73,  87,  88],  The  end  goal  of  these  measures  is  to 
detennine  the  effectiveness  of  the  user’s  infonnation  acquisition  process  and  to  highlight 
any  necessary  design  changes  that  can  aid  the  operator  in  attending  to  the  conect  infor¬ 
mation  at  the  correct  times. 

The  efficiency  of  attention  allocation  with  the  system  can  be  viewed  as  how  well  op¬ 
erators  process  their  internal  queue  of  tasks.  In  this  view,  the  operator  is  a  processing 
server  in  which  incoming  tasks  received,  are  processed,  and  then  removed  from  the  queue 
[89].  As  such,  the  efficiency  of  this  interaction  assesses  how  long  tasks  wait  to  be  added 
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to  the  queue  and,  once  there,  how  long  they  must  wait  to  be  addressed.  These  measures 
are  defined  as  Wait  time  due  to  operator  Attention  Inefficiencies  (WTAI)  [90]  and  Wait 
Time  due  to  Operator  (WTO),  respectively  [79].  Ideally,  these  measures  would  be  mini¬ 
mized,  indicating  that  the  user  has  sufficient  attentional  resources  to  identify  necessary 
tasks  and  work  through  them  quickly. 

Infonnation  Processing  Efficiency  (IPE)  metrics  are  aimed  at  determining  how  effec¬ 
tively  the  user  interacts  with  the  system  while  performing  a  single  control  action.  These 
measures  are  highly  influenced  by  research  in  the  field  of  Human-Robot  Interaction  (a 
subset  of  HSC),  and  include  tracking  explicit  interactions  with  the  interface,  the  time  of 
interaction  with  interface  segments,  and  the  rate  of  decisions  made  or  actions  perfonned 
[26,  27].  Highly  efficient  operators  will  require  a  minimum  number  and  duration  of  inter¬ 
actions  in  order  to  perform  system  tasks. 

Similarly  to  AAE  measures,  IPE  measures  may  also  involve  tracking  user  interaction 
with  the  interface.  For  a  P/RA  system,  IPE  measures  denote  the  overall  time  and  number 
of  interactions  required  to  complete  the  replanning  task  (and  individual  subtasks).  Here, 
the  outright  number  of  mouse  clicks  and  total  time  of  activity  in  each  interface  is  of  con¬ 
cern  (as  opposed  to  AAE  measures,  which  address  the  order  in  which  these  tasks  and 
sub  tasks  are  perfonned). 

Combined,  these  measures  of  Human  Behavior  Efficiency  address  the  operator’s  abil¬ 
ity  to  perform  one  or  more  tasks,  and  how  well  they  manage  switching  between  compet¬ 
ing  tasks.  These  may  be  both  descriptive,  allowing  comparisons  of  user  interaction  times 
across  different  interface  types,  and  diagnostic,  providing  guidance  to  designers  on  what 
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aspects  of  the  interface  require  a  redesign.  These  measures  are  jointly  influenced  by  the 
design  of  the  system  interface  and  endogenous  human  factors,  such  as  fatigue  and  situ¬ 
ational  awareness,  which  affect  the  user.  The  latter,  termed  Human  Behavioral  Precur¬ 
sors,  is  discussed  in  the  next  section. 

2.4.2.  Metrics  for  Human  Behavior  Precursors 

Human  Behavioral  Precursors  are  underlying  factors  that  affect  the  perfonnance  of 
users  in  an  HSC  system.  These  can  either  be  physical  factors,  such  as  physical  fatigue 
and  sleeplessness,  or  cognitive  factors,  such  as  mental  workload  and  situational  aware¬ 
ness  (SA).  Several  references  provide  discussions  on  this  topic  (see  [22,  23]  for  exam¬ 
ples),  but  for  P/RA  HSC  systems,  the  predominant  factors  are  the  cognitive  precursors 
and  their  affect  on  human  decision  making.  For  P/RA  systems,  SA  measures  will  address 
the  ability  of  the  operator  to  maintain  awareness  about  the  current  operational  state  of  the 
overall  schedule,  of  individual  schedules  for  vehicles  (or  other  entities),  of  the  presence 
of  errors  in  the  system,  and  the  ability  of  the  operator  to  forecast  future  system  states. 

Mental  workload  has  been  quantified  by  surveys  [3,  34,  37,  42],  user  interaction 
measures  [3,  39,  91-94],  or  measures  of  utilization  [89,  95],  which  returns  user  interaction 
time  as  a  percentage  of  the  total  mission  duration.  These  are  primarily  descriptive  meas¬ 
ures,  where  excessively  high  values  may  lead  to  poor  decision-making  and  an  inability  of 
the  human  operator  to  perform  at  optimal  efficiency.  For  P/RA  systems,  the  mental  effort 
required  to  understand  the  actions  of  the  algorithm  might  be  addressed  through  a  survey, 
while  the  effort  required  to  implement  actions  in  the  system  could  be  tracked  by  measures 
of  utilization. 
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The  measures  discussed  in  the  previous  two  sections  address  the  specific  interaction 
between  the  human  operator  and  the  P/RA  HSC  system.  Measures  of  Human  Behavior 
Efficiency  focused  on  the  specific  interactions  between  the  user  and  the  automation, 
while  measures  of  Behavior  Precursors  addressed  endogenous  factors  that  affect  this  in¬ 
teraction.  A  third  aspect  of  interaction  addresses  the  effectiveness  of  the  human  operator 
and  the  algorithm  in  working  as  a  team.  These  measures  of  Human-Automation 
Collaboration  are  discussed  in  the  next  section. 

2.5.  Metrics  for  Human-Automation  Couuaboration 

Measures  of  Human- Automation  Collaboration  consider  the  automation  to  be  an  ac¬ 
tive  participant  in  system  perfonnance;  the  automation  is  treated  as  a  teammate  working 
with  the  human  operator.  This  class  of  metrics  includes  the  user’s  trust  in  the  automated 
system  [96-104]  and  the  adequacy  of  the  user’s  mental  model  of  the  automation  [  105- 
108].  For  P/RA  systems,  this  addresses  the  ability  of  the  user  to  understand  the  planning 
and/or  resource  allocation  strategies  of  the  algorithm  and  the  effects  of  user  input  on 
these  strategies.  Trust  denotes  how  much  confidence  the  user  has  in  the  ability  of  the  al¬ 
gorithm  to  aid  in  creating  a  viable  solution.  In  cases  where  a  human  has  the  choice  be¬ 
tween  a  manual  and  automated  planning  system,  a  lack  of  trust  in  the  system  or  the  in¬ 
ability  to  form  accurate  mental  models  may  lead  the  user  to  return  to  manual  planning 
and  reject  the  automation,  regardless  of  its  performance.  For  a  P/RA  HSC  system,  meas¬ 
ures  of  trust  typically  correlate  to  the  willingness  of  the  operator  to  accept  the  plans  gen¬ 
erated  by  the  automation,  or  if  automation  is  optional,  may  explain  the  operator’s  utiliza¬ 
tion  of  the  automated  system  [109,  110]. 
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Mental  models  are  “the  mechanisms  whereby  humans  are  able  to  generate  descrip¬ 
tions  of  system  purpose  and  fonn,  explanations  of  system  functioning  and  observed  sys¬ 
tem  states,  and  predictions  (or  expectation)  of  future  system  states”  [105]  and  may  take 
many  forms  [107].  When  a  human  operator  has  a  highly  accurate  mental  model,  they  are 
better  able  to  understand  and  predict  the  performance  of  the  automation.  This  engenders 
trust  in  the  user,  which  continues  to  build  until  the  system  exhibits  unexpected  behavior. 
Inaccurate  mental  models  can  be  a  product  of  the  operator’s  inability  to  understand  the 
system  or  of  unreliable  system  perfonnance  and  may  result  in  major  accidents  or  aban¬ 
donment  of  the  system  [105,  106,  108],  For  P/RA  HSC  systems,  this  class  of  measures 
concerns  how  well  the  user  is  able  to  understand  the  processes  of  the  automated  algo¬ 
rithm.  For  instance,  Rapidly-exploring  Random  Tree  (RRT)  algorithms  randomly  select 
points  in  space  when  creating  paths  in  an  environment.  This  randomness  may  make  it  dif¬ 
ficult  for  a  human  operator  to  understand  the  planner’s  actions,  build  a  mental  model,  and 
predict  system  behavior.  A  less  random  algorithm,  such  as  an  ILP  that  has  a  set  cost  func¬ 
tion,  may  be  more  predictable  for  the  operator. 

2.6.  Chapter  Summary 

This  chapter  has  reviewed  a  metric  class  hierarchy  put  forth  in  previous  literature  [26, 
27]  and  discussed  its  application  to  P/RA  systems.  The  first  section  of  the  chapter  re¬ 
viewed  the  contents  of  this  hierarchy,  followed  by  four  sections  describing  its  five  metric 
categories.  The  second  section  addressed  measures  of  Mission  Efficiency,  which  describe 
the  functionality  of  the  system  as  a  whole.  This  was  followed  by  a  section  discussing 
common  algorithm  types  for  P/RA  systems,  which  classified  algorithms  in  terms  of  three 


basic  categories  of  functionality.  Limitations  on  applicability  and  appropriate  metrics  for 
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each  were  also  discussed  in  this  section,  as  were  measures  of  Autonomous  Platform  Be¬ 
havior  Efficiency.  These  measures  serve  to  describe  the  ability  of  the  algorithm  to  sup¬ 
port  operations  within  the  HSC  system.  The  third  section  of  this  chapter  discussed  two 
categories  of  human  performance  measures.  Human  Behavior  Efficiency  addresses  the 
operator’s  active  engagement  of  the  interface,  while  Human  Behavior  Precursors  measure 
the  passive  influences  that  affect  this  engagement.  The  final  section  of  the  chapter  dis¬ 
cussed  overall  measures  of  the  collaboration  between  the  human  and  the  automated  sys¬ 
tem.  Together,  this  set  of  five  metric  classes  allows  for  an  analysis  of  the  perfonnance  of 
the  system  as  a  whole  (Mission  Efficiency),  as  well  as  the  performance  of  individual  sub¬ 
components  (the  remaining  four  classes).  As  noted  in  this  chapter,  the  specific  definition 
of  metrics  within  these  classes  is  dependent  on  the  characteristics  of  the  automated  plan¬ 
ner  and  the  domain  in  which  the  system  functions.  The  next  chapter  will  provide  a  de¬ 
scription  of  the  representative  system  used  in  this  thesis,  the  Deck  operations  Course  of 
Action  Planner.  DCAP  is  a  representative  P/RA  HSC  system  requiring  the  interaction  of 
a  human  operator  and  an  embedded  planning  algorithm  to  reschedule  operations  on  the 
aircraft  carrier  deck. 


46 


3.  The  Deck  Operations  Course  of  Action  Planner 


(DCAP) 

The  DCAP  system  was  developed  for  the  purposes  of  exploring  planning  under  un¬ 
certainty  for  heterogeneous  manned-unmanned  environments.  DCAP  is  a  representative 
P/RA  HSC  system,  as  DCAP  requires  input  from  both  a  human  operator  and  an  algorithm 
in  order  to  perfonn  scheduling  tasks.  The  system  focuses  on  a  simulated  aircraft  carrier 
deck  environment,  where  numerous  crewmembers  and  aircraft  act  simultaneously  in  con¬ 
ducting  flight  operations  (actual  en  route  mission  and  strike  planning  are  outside  the 
scope  of  this  simulation).  In  this  regard,  the  problem  is  one  of  resource  allocation  -  the 
set  of  aircraft  on  the  deck  must  accomplish  specific  tasks  that  are  enabled  through  the  use 
of  limited  resources,  such  as  launch  catapults,  elevators,  fuel  stations,  and  a  landing  strip. 
While  the  DCAP  simulation  is  specific  to  carrier  operations,  there  is  little  conceptual  dif¬ 
ference  between  the  task  allocation  performed  in  this  system  and  that  done  in  airport  traf¬ 
fic  routing,  supply  chain  management,  or  other  logistics  and  supply  chain  management 
problems. 

In  DCAP,  the  operator  has  the  choice  of  when  to  engage  the  system  to  replan  the 
schedule.  The  operator  may  choose  to  do  so  due  to  the  occurrence  of  a  failure  in  the  sys¬ 
tem  (e.g.,  a  catapult  fails  during  launch  operations)  or  due  to  the  operator’s  dissatisfaction 
with  the  current  schedule.  During  the  replanning  process,  the  human  operator  provides 
guidance  to  the  planning  and  scheduling  algorithm  through  a  set  of  displays.  The  algo¬ 
rithm  returns  a  proposal  for  a  new,  theoretically  near  optimal  operating  schedule  that  in¬ 
corporates  the  operator’s  instructions  while  also  accounting  for  additional  system  con- 
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straints  (e.g.,  compensating  for  any  failures  that  are  present  and  ensuring  that  no  aircraft 
runs  out  of  fuel).  This  proposed  schedule  is  presented  to  the  operator  through  modifica¬ 
tions  to  the  display  in  which  the  inputs  were  created.  This  simulation  environment  also 
includes  an  embedded  vehicle  routing  system,  implementing  some  collision  avoidance 
capabilities.  This  chapter  describes  the  simulation  environment  and  the  DCAP  system 
components. 

3.1.  The  Simulation  Environment 

The  DCAP  simulation  environment  is  intended  to  replicate  operations  on  United 
States  aircraft  carriers.  The  deck  environment  is  identical  in  layout  to  the  current  fleet  of 
Nimitz-class  carriers  and  is  shown  in  Figure  6. 


Four  Arresting  Wires  (black): 
arresting  wires  are  used  to  stop 
the  aircraft  during  landing. 


Landing  Zone  (parallel  green 
lines):  area  in  which  aircraft  land 
during  recovery  operations.  These 
lines  turn  red  when  an  aircraft  is 
landing. 


Three  Elevators 
(white):  move 
aircraft  between 
flight  deck  and 
hangar  deck 


Four  Catapults  (orange 
lines):  Launch  aircraft 
off  of  the  flight  deck 


Figure  6.  Basic  Areas  of  the  Aircraft  Carrier  Deck. 
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There  are  four  forward  facing  catapults,  oriented  in  forward  and  aft  pairs.  Within  each 
pair,  aircraft  launch  from  the  catapults  in  an  alternating  fashion;  between  the  pairs,  air¬ 
craft  may  launch  simultaneously.  After  launching,  aircraft  proceed  to  a  mission  area.  Af¬ 
ter  mission  completion,  aircraft  return  to  a  holding  pattern,  known  as  the  Marshal  Stack 
(MS),  several  miles  away  from  the  ship.  Aircraft  remain  in  the  holding  pattern  until  they 
are  given  clearance  to  land.  When  clearance  is  given,  aircraft  exit  the  Marshal  Stack  indi¬ 
vidually  with  consistent  spacing.  On  landing,  aircraft  must  catch  one  of  the  restraining 
cables  with  a  “tailhook,”  which  extends  backwards  from  the  bottom  of  the  aircraft.  If  the 
tailhook  does  not  catch  a  wire,  the  aircraft  must  cycle  back  to  the  holding  pattern  before 
attempting  a  second  landing.  It  can  also  be  seen  in  Figure  6  that  the  aft  catapults  (Cata¬ 
pults  3  and  4)  are  collocated  with  the  landing  strip,  applying  an  additional  constraint  to 
operations.  This  area  can  be  used  for  either  landing  or  launching  aircraft,  but  not  both.  A 
time  penalty  is  also  incurred  when  changing  the  deck  configuration  between  launch  and 
landing,  as  the  landing  cables  must  be  replaced  or  removed  by  personnel  on  the  deck.  The 
simulation  includes  four  different  generic  aircraft  forms,  modeled  from  realistic  plat¬ 
forms.  These  four  vehicles  are  variations  on  fast/slow  and  manned/unmanned  aircraft  and 
are  listed  in  Table  4. 


Table  4.  Types  of  aircraft  modeled  in  the  DCAP  simulation. 


Fast  (higher flight  speed;  requires  weapons) 

Slow  (lower flight  speed;  no  weapons) 

Manned 

Fast  Manned  Aircraft  (FMAC,  based  on  FI 8) 

Lowest  endurance 

Slow  Manned  Aircraft  (SMAC,  based  on  C2 
Greyhound) 

High  endurance 

Unmanned 

Fast  UAV  (FUAV,  based  on  X-47B  Pegasus) 

Medium  endurance 

Slow  UAV  (SUAV,  based  on  the  MQ-1 
Predator) 

Highest  endurance 
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Fast  aircraft  have  higher  flight  speeds  and  require  weapons  to  be  loaded  before  taking 
off.  They  also  have  lower  endurance  (total  possible  flight  time)  than  the  slow  aircraft. 
Slow  aircraft  have  lower  maximum  flight  speeds,  so  they  have  a  far  lower  fuel  consump¬ 
tion  rate.  Both  UAV  types  have  longer  endurances  than  their  manned  counterparts,  with 
the  SUAV  having  the  highest  endurance  overall.  The  FMAC  has  the  lowest  endurance 
but  represents  the  largest  proportion  of  the  fleet. 

Despite  these  differing  characteristics,  all  aircraft  taxi  across  the  deck  at  the  same 
speeds,  roughly  equivalent  to  human  walking  speed.  This  is  a  safety  constraint  on  opera¬ 
tions;  taxi  speed  is  limited  due  to  the  high  number  of  crew  on  deck  (typically  over  100 
individuals  on  the  18,210  nr  deck).  Aircraft  are  the  driving  elements  within  the  system 
schedule;  every  other  entity  on  the  deck,  including  crew,  can  be  seen  as  resources  utilized 
by  the  aircraft  to  perform  tasks.  In  the  simulation,  aircraft  tasks  describe  the  high-level 
actions  of  the  aircraft,  such  as  “Taxi  to  parking  spot”  or  “Takeoff  from  Catapult  2.”  Tasks 
are  not  given  defined  start  and  stop  deadlines  as  system  complexity  and  constraints  may 
not  pennit  actions  to  occur  at  precisely  defined  times.  For  example,  as  the  schedule  exe¬ 
cutes,  variations  in  process  times  and  traffic  congestion  on  deck  lead  to  delays  in  the 
schedule.  A  task  that  was  originally  given  a  start  time  of  t  might  only  be  able  to  begin  at  t 
+  n  due  to  limitations  on  the  rate  of  fueling,  transit,  and  other  actions1.  Instead,  advance¬ 
ment  to  the  next  task  is  based  on  satisfying  state  conditions  (e.g.,  the  taxi  task  ends  when 
the  aircraft  reaches  the  desired  final  location).  This  accounts  for  the  variety  of  interac¬ 
tions  that  constrain  operations  on  deck,  such  as  the  replanning  of  taxi  routes,  delays  due 

1  Planners  (human  or  algorithm)  are  not  allowed  to  command  changes  in  task  execution  rates.  The  rate 
at  which  a  task  occurs  is  cither  a  set  property  of  the  resource  or  an  inviolable  safety  constraint  (e.g.  taxi 
speed  for  aircraft). 
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to  lack  of  crew  escorts,  or  constraints  on  launching  aircraft  simultaneously  at  adjacent 
catapults.  Additionally,  in  the  simulation,  several  tasks  are  given  variable  processing 
times  sampled  from  Gaussian  distributions* 2  in  order  to  model  the  variations  seen  in  real- 
life  operations. 

As  noted  earlier,  crewmembers  are  resources  for  aircraft  to  utilize,  and  their  presence 
is  required  for  a  number  of  operations.  Seven  different  subsets  of  crew  are  modeled  in  the 
simulation,  each  represented  by  the  color  of  their  unifonn  in  the  real  world  (Table  5). 


Table  5.  List  of  crew  groups  (by  color)  and  roles. 


Personnel 
Uniform  Color 

Role 

Yellow 

Escorts  and  guides  aircraft  on  deck 

Brown 

Oversees  plane  maintenance;  aids  in  escorting  aircraft  on  deck 

Blue 

Responsible  for  deck  equipment;  aids  in  escorting  aircraft  on 

deck 

Red 

Handles  weapons  loading  and  unloading 

Purple 

Responsible  for  fueling  aircraft 

Green 

Operates  catapult  and  landing  strip  arresting  wires 

White 

Safety  officers  -  serve  only  a  monitoring  role 

In  the  DCAP  simulation,  each  aircraft  requires  1  yellow-,  1  brown-,  and  2  blue- 
shirted  crewmembers  to  be  present  in  order  to  taxi,  a  set  of  5  green-shirted  crewmembers 
present  to  operate  a  given  catapult,  and  ten  (the  same  ten  assigned  to  Catapults  3  and  4) 
present  to  operate  the  landing  cables3.  Additionally,  some  wheeled  vehicles  -  Deck  Sup- 


Mean  times  of  operation  and  distributions  for  fueling,  landing,  and  takeoff  procedures  were  taken 

from  interviews  with  subject  matter  experts  including  nearly  two  dozen  instructors  at  a  U.  S.  Naval  training 
base,  each  with  several  years  of  experience  in  deck  operations.  See  Appendix  A  for  further  information  on 
these  distributions. 

3  In  actual  operations,  only  three  crewmembers  are  required  for  the  landing  cables.  This  constraint  was 
modified  within  the  simulation  to  serve  a  secondary  function  of  moving  these  crew  out  of  the  landing  zone 
to  prevent  landing  conflicts. 
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port  Vehicles  -  also  exist  in  the  simulation,  just  as  in  the  actual  deck  environment.  In  re¬ 
ality,  these  vehicles  would  be  tow  trucks  used  for  relocating  aircraft.  In  this  simulation, 
these  are  modeled  as  futuristic  unmanned  weapons  loaders,  which  assist  red-shirted  crew 
in  performing  their  tasks  on  deck.4  The  movement  of  these  entities  (aircraft,  crew,  and 
Deck  Support  Vehicles)  is  animated  for  the  user  within  a  main  display  window,  termed 
the  Carrier  Display  window;  Figure  7  shows  a  close-up  view  of  the  deck  from  this  win¬ 
dow.  The  Carrier  Display  serves  as  the  foundation  around  which  the  remainder  of  the 
DCAP  system  is  built.  The  next  section  will  detail  the  specific  display  elements  in  the 
system. 


Crew  tcnlnrorl  Dork  Shinnnrt 


Figure  7.  View  of  the  Carrier  Deck,  showing  crew,  Aircraft,  and  Unmanned  Weapons 

Loaders. 

3.2.  Display  Elements 

The  DCAP  system  utilizes  a  set  of  display  elements  to  display  information  about  the 
current  operating  schedule,  aircraft  states,  and  system  failures  to  the  operator.  The  opera¬ 
tor  then  interacts  with  the  automated  system  in  order  to  create  at  a  feasible  plan  of  opera¬ 
tions  that  he  or  she  finds  acceptable.  Bruni  et  al.  provide  a  model  for  this  collaborative 

4  This  difference  was  influenced  by  the  goals  of  the  overall  research  program  and  testing  the  inclusion 
of  unmanned  systems  being  included  in  the  Naval  environment. 
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human-automation  decision  making,  defining  both  the  process  (Figure  8)  and  roles  for 
entities  in  the  system  [111,  112]. 

The  process  begins  with  the  acquisition  of  data  from  the  world  {Data  Acquisition 
block),  which  is  then  used  by  the  Generator  in  a  Data  Analysis  process.  The  result  of  this 
data  is  used  in  an  Evaluation  step.  This  is  guided  by  a  Moderator  who  describes  elements 
of  the  solution  to  the  Generator,  makes  sub-decisions  that  require  refinement  of  the  solu¬ 
tions,  and  may  request  further  data  analysis.  When  the  Moderator  has  created  an  accept¬ 
able  solution  option  (or  set  of  options)  it  is  sent  to  the  Decider  for  approval.  The  solution 
is  then  either  accepted  or  rejected. 


Figure  8.  Model  of  Human- Automation  Collaborative  Decision  Making  [111]. 


Applying  this  model  to  DCAP,  the  algorithm  plays  the  role  of  solution  Generator, 
while  the  human  operator  plays  the  roles  of  Moderator  and  Decider.  This  model  of  inter¬ 
action  influenced  the  creation  of  three  separate  interface  configurations  for  the  DCAP 
system  -  an  Information  Display  configuration  {Data  Acquisition),  a  Plan  Creation  con¬ 
figuration  {Data  Analysis  and  Request  and  sub-decisions  for  the  Moderator),  and  a  Pro¬ 
posal  Review  configuration  {Evaluation  and  Veto  for  the  Decider).  A  Hybrid  Cognitive 
Task  Analysis  (hCTA)  was  used  to  generate  specific  function  and  information  require¬ 
ments  for  each  of  these  three  interfaces.  The  hCTA  process  involves  the  creation  of  theo- 
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rized  process  flow  diagrams  of  high-level  operator  actions  within  the  system  (e.g., 
“Monitoring”  or  “Replanning.”).  These  process  flow  diagrams  include  segments  for  op¬ 
erator  processes,  decisions,  assumptions,  and  iterative  loops.  From  the  decision  blocks, 
decision  trees  can  be  created  that  further  detail  the  decision  making  process,  illuminating 
the  specific  infonnation  required  through  the  decision  making  process  [113]. 

The  result  of  the  DCAP  hCTA  process  was  a  set  of  functional  and  infonnation  re¬ 
quirements  that  guided  the  development  of  three  different  display  configurations5.  In  the 
Infonnation  Display  configuration,  the  display  serves  as  an  information  acquisition  and 
display  tool  to  support  operator  monitoring  of  the  system  and  the  decision  on  when  to 
create  a  new  schedule  (“replan”).  This  configuration  directly  supports  the  operator  in  the 
Data  Analysis  + Request  step.  The  second  configuration,  the  Plan  Creation  configuration, 
allows  the  operator  to  specify  inputs  and  constraints  for  the  new  schedule  to  the  algo¬ 
rithm,  supporting  the  operator  in  the  role  of  Moderator.  The  third  and  final  configuration, 
the  Proposal  Review  configuration,  supports  the  operator  in  their  role  as  the  Decider, 
while  also  allowing  the  user  to  return  to  the  Moderator  stage  to  alter  or  to  provide  addi¬ 
tional  inputs  to  the  algorithm.  The  following  subsections  will  address  each  of  these  dis¬ 
play  configurations  individually. 

3.2,1,  Information  Display  Configuration 

The  Infonnation  Display  configuration  is  the  main  configuration  of  the  interface 
(Figure  9).  The  Carrier  Display  shows  the  current  location  of  all  vehicles  and  crew  on 
deck.  This  frame  can  show  either  a  close-up  view  of  the  deck  (Figure  10),  or  a  zoomed 

5  Details  concerning  the  DCAP  hCTA  can  be  found  in  Appendix  B.  A  tutorial  on  using  the  interface 
can  be  found  in  Appendix  C. 
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out  view  for  monitoring  flight  operations  (Figure  1 1).  The  Marshal  Stack  Display  (Figure 
9)  shows  the  current  list  of  aircraft  waiting  to  land  and  the  landing  order.  Individual  air¬ 
craft  schedules  appear  in  the  Aircraft  Schedule  Panel  (ASP),  a  vertical  list  on  the  right 
side  of  the  screen  (Figure  9).  The  Deck  Resource  Timeline  (DRT)  at  the  bottom  of  the 
screen  shows  the  allocation  of  tasks  for  the  four  catapults  and  the  landing  strip  (Figure  9). 
These  two  sections  of  the  interface  also  convey  information  on  aircraft  and  deck  resource 
failures  to  the  user.  The  remaining  features  of  the  interface  are  supporting  features,  such 
as  sort  options  and  legends.  The  user  initiates  replanning  by  pressing  the  “Request  Sched¬ 
ule”  button  at  the  upper  right  corner  of  the  screen.  This  shifts  the  display  interface  to  the 
Plan  Creation  Configuration. 


Marshal  Stack 
Display 


Carrier  Display 

li'.r.ft  C.1..F  ... .  .in.ul.ta. 


Variable  Ranking 
Reminder 


Aircraft 
Schedule  Panel 
(ASP) 


-3.  ”  '-taSk 


Deck  Resource 
Timeline 
(DRT) 


Crew  Legend  Timeline  Sort  Timeline  Variable 
Options  Legend  Rankings 


Figure  9.  Infonnation  Display  Configuration  of  the  DCAP  Interface 
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Figure  11.  Full  image  of  Carrier  Display,  "General  Overview." 
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3.2.2.  Plan  Creation  Configuration 


The  Plan  Creation  configuration  allows  the  user  to  define  weights  for  the  planner’s 
objective  function  as  well  as  a  set  of  additional  constraints  on  the  solution.  The  creation 
of  objective  function  weights  is  done  by  ranking  the  relative  priority  of  a  set  of  personnel 
groups  within  the  environment  (e.g.  deck  aircraft  or  crewmembers).  For  example,  in  the 
aircraft  carrier  environment,  the  mission  focus  alternates  between  launching  and  landing 
aircraft.  Planning  priorities  can  be  continually  adjusted  to  reflect  these  changes.  At  other 
times,  concerns  for  the  workload  of  the  crew  and  support  vehicles  on  deck  may  arise  and 
further  modify  mission  priorities.  Having  a  single,  consistent  definition  of  priority  levels 
does  not  effectively  capture  the  complexity  of  the  environment. 

Constraints  are  created  by  assigning  priority  ratings  to  specific  aircraft,  then  defining 
a  desired  schedule  for  each  aircraft.  These  two  actions  (ranking  personnel  groups  and 
assigning  aircraft  priority  designations)  can  be  done  in  any  order,  but  both  must  be  done 
before  the  automated  planner  can  begin  its  computations.  This  section  will  describe  the 
ranking  of  the  personnel  groups  first,  and  then  will  describe  the  definition  of  individual 
aircraft  priority. 

3.2. 2.1.  Relative  Priorities  in  the  Variable  Ranking  Tool 

Entering  the  Plan  Creation  Configuration  first  allows  the  operator  the  option  to  bring 
up  an  additional  frame  -  the  Variable  Ranking  Tool  (Figure  12)  -  to  define  a  set  of  priori¬ 
ties  for  four  personnel  groups  on  deck.  These  four  groups  are  defined  to  be  Airborne  Air¬ 
craft  (AA),  Deck  Aircraft  (DA),  Crew  Working  on  deck  (CW),  and  Deck  Support  vehi¬ 
cles  (DS). 
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The  goal  of  this  ranking  process  is  to  allow  the  operator  to 
specify  the  relative  importance  of  each  of  these  groups  to  the 
planning  algorithm  and  is  done  via  a  drag-and-drop  interface  us¬ 
ing  five  levels  of  priority.  Including  this  as  a  step  in  the  plan 
creation  process  allows  the  operator  flexibility  in  modifying  the 
algorithm’s  objective  function  in  light  of  changing  conditions  on 
the  aircraft  carrier  deck.  This  ranking  can  occur  in  any  manner 
the  operator  desires  -  placing  variables  on  separate  levels,  all 
on  a  single  level,  and  any  variation  in  between.  The  level  of 
ranking  corresponds  to  a  numerical  weight  for  the  objective 
function  -  the  highest  ranked  variables  receive  a  weight  of  5, 


H  Variable  Ranking  Tool  _'rn  x 


Figure  12.  Ranking 
interface  for  the  four 
system  variables. 


the  lowest  ranked  receive  a  weight  of  1 .  The  operator  clicks  “Submit”  to  save  the  person¬ 
nel  group  rankings  and  transmit  them  to  the  algorithm. 

3.2. 2.2,  Individual  Priorities  in  the  Aircraft  Schedule  Panel 

This  configuration  also  allows  the  user  to  specify  aircraft-specific  constraints  in  the 
Aircraft  Schedule  Panel  (ASP,  Figure  12).  Pressing  “Request  Schedule”  causes  check¬ 
boxes  to  appear  next  to  each  aircraft  box  in  the  ASP.  Clicking  a  checkbox  designates  the 
corresponding  aircraft  as  a  priority  consideration  for  the  automated  planner,  as  seen  in 
Figure  13.  Additionally,  this  action  causes  the  aircraft  timeline  to  split  horizontally  into 
halves.  The  upper  half  displays  the  aircraft’s  current  operating  timeline,  while  the  bottom 
half  can  be  manipulated  by  the  operator  and  shows  a  projected  schedule  based  on  the  op¬ 
erator’s  preferences.  Figure  13  provides  an  example  of  an  aircraft  that  has  been  desig¬ 
nated  as  having  priority  status  (the  checked  box  on  the  left)  with  a  suggestion  to  signifi- 
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cantly  delay  operations  (the  bottom  timeline  has  been  moved  to  the  right).  In  this  figure, 
the  Predator  UAV  is  about  to  begin  taxi  operations  on  deck  in  preparation  for  mission 
(the  upper  timeline).  In  the  bottom  timeline,  the  operator  has  requested  that  this  aircraft 
delay  these  operations  for  an  additional  15  minutes. 

Predator  #1 

✓ 


Figure  13.  Example  of  priority  definition  and  suggestion  of  an  operating  schedule  for  an 

aircraft  in  the  ASP. 

The  operator  has  the  flexibility  to  specify  as  many  or  as  few  priority  aircraft  as  de¬ 
sired,  and  may  adjust  the  schedules  of  all  or  none  of  these  aircraft.  Once  all  desired 
changes  are  made,  the  operator  presses  “accept”  to  submit  this  information  to  the  auto¬ 
mated  planning  algorithm.  The  inputs  from  the  VRT  and  ASP  are  then  utilized  simulta¬ 
neously.  The  overall  schedule  is  optimized  according  to  the  weights  from  the  VRT  while 
also  satisfying  the  constraints  on  aircraft  schedules.  After  the  automated  planning  algo¬ 
rithm  has  finished  its  computations,  the  proposed  schedule  is  returned  to  the  display  ele¬ 
ments  and  shown  in  the  Proposal  Review  Configuration,  discussed  in  the  next  section. 

3.2.3.  Proposal  Review  Configuration 

After  finishing  its  computations,  the  automated  algorithm  returns  a  proposed  schedule 
to  the  system  to  be  displayed  for  operator  approval  (Figure  14). 


59 


Marshal  Stack  Carrier  Display 


Variable 

Ranking 


Variable 

Ranking 


Aircraft 

Schedule  Panel 


Deck  Resource 
Timeline 


Crew  Legend  Timeline  Sort  Timeline 

Options  Legend 


Figure  14.  Proposal  Review  configuration. 


The  proposed  schedule  is  shown  using  modifications  of  the  basic  display  ele¬ 
ments.  The  convention  used  is  similar  to  that  of  the  Plan  Creation  Configuration,  in 
which  the  human  operator  is  allowed  to  make  suggestions  in  the  lower  half  of  each  air¬ 
craft’s  timeline  while  the  upper  continues  to  show  the  current  operating  schedule.  In  the 
Proposal  Review  Configuration,  aircraft  timelines  in  the  ASP  remain  split  into  upper  and 
lower  halves.  The  upper  still  continues  to  show  the  current  operating  schedule,  but  the 
lower  now  shows  the  algorithm’s  proposed  schedule  for  this  aircraft.  Additionally,  a  sec¬ 
ond  Deck  Resource  Timeline  appears  below  the  first,  utilizing  the  same  convention  -  the 
upper  timeline  shows  current  operations  while  the  lower  shows  the  proposed  schedule. 
This  allows  the  human  operator  to  easily  identify  the  differences  between  current  and 
proposed  schedules  for  each  of  these  timelines. 
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An  additional  display  window  in  this  configu¬ 
ration  is  the  Disruption  Visualization  Tool  (DVT, 

Figure  15).  This  configural  display  [114-116]  dis¬ 
plays  comparisons  of  active  operating  time  for  the 
four  variable  groups  (Airborne  Aircraft,  Deck 
Aircraft,  Crew  Working,  and  Deck  Support  vehi¬ 
cles).  Each  quadrant  of  the  diamond  maps  to  the 
ratio  of  active  time  for  the  proposed  schedule  to 
the  active  time  required  for  the  current  schedule 
for  an  individual  variable  group.  The  dashed  line  denotes  a  ratio  value  of  one  -  no  change 
occurred  between  the  proposed  and  current  schedules.  Lower  ratios  (smaller  green  trian¬ 
gles,  whose  edge  is  inside  the  dashed  line)  imply  that  the  algorithm  was  able  to  schedule 
tasks  for  that  group  more  efficiently.  Higher  ratios  (larger  red  triangles,  whose  edge  is 
outside  the  dashed  line)  imply  that  the  algorithm  was  unable  to  do  so,  due  either  to  opera¬ 
tor  specifications  or  a  degraded  system  state  (such  as  an  accumulation  of  delays  in  the 
schedule).  For  the  image  in  Figure  15,  the  proposed  schedules  for  both  the  Airborne  (up¬ 
per  left)  and  Deck  Aircraft  (upper  right)  are  more  efficient  than  the  current  schedules. 
The  proposed  schedule  for  the  Crew  (bottom  left)  is  much  less  efficient,  while  the  pro¬ 
posed  schedule  for  the  Deck  Support  vehicles  is  only  marginally  less  efficient.  This  dis¬ 
play  does  not  include  error  and  warning  messages,  such  as  exceeding  the  total  acceptable 
working  time,  and  is  only  meant  to  provide  a  simple,  easy-to-understand  depiction  of 
relative  cost  of  the  new  plan.  Such  warning  and  alerting  displays  are  left  for  future  work. 


Disruption  Visualization  Tool  [~Z~H~E~||~x~l 


Figure  15.  Disruption 
Visualization  Tool  (DVT). 
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The  goal  of  the  Proposal  Review  Configuration  is  to  display  sufficient  information  to 
the  operator  to  determine  whether  the  proposed  schedule  should  be  accepted,  modified,  or 
rejected.  When  the  user  decides  that  the  plan  is  worthy  of  acceptance,  the  proposed 
schedule  and  the  system  reset  to  the  Information  Display  configuration.  The  preceding 
sections  have  discussed  the  actions  taken  by  the  operator,  but  have  not  discussed  the 
automated  algorithm  and  how  it  handles  these  inputs.  A  brief  discussion  of  this  is  pro¬ 
vided  in  the  next  section. 

3.3.  The  Automated  Planner 

The  current  automated  algorithm  in  use  in  the  DCAP  system6  is  an  Integer  Linear 
Program  (ILP)  [45].  Generally,  Linear  Programming  (LP)  algorithms  function  by  mini¬ 
mizing  a  given  cost  function  while  simultaneously  satisfying  a  set  of  constraints  defined 
for  the  problem  space.  The  cost  function  is  generally  a  summation  of  a  set  of  variables, 
each  assigned  a  different  scoring  weight.  Constraint  satisfaction  is  typically  modeled  by 
defining  an  upper  bound  for  several  summations  (e.g.,  sum  of  all  x  should  be  less  than  1). 
An  example  ILP  formulation  appears  below: 

minimize  cTx 

subject  to  Ax  =  b  (1) 

x  >  0 

where  cT  is  a  matrix  of  weighting  values  and  x  is  a  matrix  of  system  variables.  The  vari¬ 
ables  A  and  b  are  matrices  used  to  define  constraints  in  the  system.  In  the  case  of  DCAP, 
cTx  is  a  function  that  minimizes  total  operational  time.  This  was  selected  since  minimiz- 

6  The  DCAP  system  is  modular  with  respect  to  the  automated  algorithm.  Any  algorithm  can  be  utilized 
in  the  system,  as  long  as  it  is  adapted  to  accept  the  appropriate  inputs  and  outputs.  Future  testing  and  vali¬ 
dation  will  utilize  MDPs  and  Queuing  Network-based  policy  generators. 

62 


ing  active  time  also  minimizes  fuel  consumption  (fuel  is  a  limited  resource)  and  a  maxi¬ 
mization  of  safety  (less  time  in  operations  implies  fewer  chances  for  accidents  to  occur)7. 
The  matrix  cT  is  populated  by  the  rankings  in  the  Variable  Ranking  Tool  (Airborne  Air¬ 
craft,  Deck  Aircraft,  etc.).  The  corresponding  entries  in  x  contain  the  total  active  time  of 
each  variable  group  (i.e.,  the  total  man-hours  of  labor  for  Deck  Aircraft).  The  matrices  A 
and  b  consist  of  additional  weights  on  x  and  bounds  on  values,  respectively.  A  constraint 
applied  to  a  single  aircraft’s  fuel  level  at  landing  may  take  the  fonn  of 

x  >  0.20  (2) 

where  A  is  equal  to  1  and  b  is  equal  to  0.20  (20%).  This  constraint  dictates  that  the  air¬ 
craft’s  fuel  level  at  landing  (a  member  ofx)  should  be  at  least  20%  of  the  maximum  fuel 
level.  This  would  be  an  example  of  a  “hard”  constraint  utilized  by  the  planning  algo¬ 
rithm8. 

Inputs  from  the  Aircraft  Scheduling  Panel  are  used  as  “soft”  constraints  on  the  sys¬ 
tem.  The  heavily  constrained  nature  of  the  system  implies  that  an  operator’s  desired 
schedule  for  an  aircraft  may  not  be  possible,  as  changes  to  a  single  aircraft’s  schedule 
could  affect  the  entire  system.  To  account  for  this,  the  planning  algorithm  treats  the  sug¬ 
gested  schedule  as  a  soft  constraint  -  the  algorithm  attempts  to  minimize  the  total  differ¬ 
ence  between  the  desired  task  times  (as  input  by  the  user)  and  those  returned  in  the  new 
schedule  solution.  Treating  this  as  a  hard  constraint  would  force  the  system  to  incorporate 

7 

This  concern  is  also  reflected  in  interviews  with  Naval  personnel.  When  schedules  degrade  and  re¬ 
quire  replanning,  the  personnel  stated  that  their  main  concern  is  executing  aircraft  tasks  as  fast  as  possible 
while  maintaining  safe  operations. 

8  Hard  constraints  should  never  be  violated  by  the  planning  algorithm.  For  instance,  if  the  specification 
is  to  remain  less  than  or  equal  to  0.20,  the  value  should  never  reach  0.21.  Soft  constraints  are  more  flexible, 
guiding  the  algorithm  to  a  certain  objective  but  not  requiring  the  objective  to  be  satisfied.  As  noted  in  the 
test,  one  form  of  this  is  to  minimize  a  value  without  placing  bounds  on  the  value. 
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these  suggestions  exactly  as  specified,  which  may  not  be  possible  due  to  the  complexity 
of  the  system  and  the  dynamics  of  the  environment.  The  user,  unable  to  accurately  predict 
the  evolution  of  the  system,  would  then  be  suggesting  a  series  of  infeasible  schedules  to 
the  algorithm.  Minimizing  the  overall  difference  in  start  times  between  the  suggested  and 
the  returned  schedule  allows  the  algorithm  to  provide  a  feasible  schedule  that  adheres  as 
closely  to  the  original  schedule  as  possible.  Although  the  system  does  not  currently  high¬ 
light  instances  of  infeasibility  to  the  user,  this  will  be  a  topic  of  future  work. 

A  formal  testing  of  the  algorithm  on  several  sample  problems,  as  well  as  comparisons 
to  additional  well-known  LP  solvers,  can  be  found  in  [45].  While  the  measures  included 
in  Banerjee  et  al.  [45]  suffice  for  analyzing  the  performance  of  the  algorithm  on  its  own, 
this  testing  must  be  repeated  once  the  algorithm  is  integrated  with  the  P/RA  system.  This 
is  due  both  to  a  change  in  the  specific  problem  domain,  which  has  characteristics  differ¬ 
ent  from  the  test  problem,  and  to  the  inclusion  of  the  human  operator  in  the  system, 
whose  inputs  of  priorities  and  constraints  will  affect  algorithm  perfonnance.  An  analysis 
of  algorithm  performance  under  these  circumstances  appears  in  Chapter  5.3. 

3.4.  Chapter  Summary 

This  chapter  has  provided  a  description  of  the  simulation  environment,  described  the 
layout  of  the  interface  and  how  an  operator  interacts  with  the  system  to  create  a  new  plan, 
and  provided  a  brief  description  of  the  current  automated  planning  algorithm.  The  goal  of 
this  chapter  was  to  describe  the  characteristics  of  the  environment  and  the  methods  of 
human  interaction  in  order  to  motivate  the  metrics  and  testing  program  developed  for  the 
DCAP  system.  The  following  chapter  describes  the  definition  of  these  metrics. 
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4.  Performance  Validation  Testing 


In  focusing  on  validating  the  performance  of  the  system,  and  specifically  the  ade¬ 
quacy  of  the  algorithm  in  supporting  the  system,  a  comparison  between  the  DCAP  Hu¬ 
man-Algorithm  (HA)  planning  system  and  its  real-world  counterpart  is  required.  This 
comparison  is  difficult  to  perfonn  due  to  limitations  in  access  to  detailed  operational  logs 
in  the  aircraft  carrier  environment,  as  well  as  differences  between  the  simulation  envi¬ 
ronment  and  real-world  operations.  This  latter  confound  is  difficult  to  avoid,  as  DCAP  is 
a  revolutionary  system  -  it  has  no  real-world  predecessor  and  thus  must  be  modeled  in  a 
simulated  environment.  However,  a  comparison  to  real-world  operations  can  still  be  per¬ 
formed  by  comparing  the  performance  of  DCAP’ s  HA-generated  plans  to  plans  generated 
by  real-world  Subject  Matter  Experts  (SMEs)  that  work  in  the  aircraft  carrier  environ¬ 
ment.  These  SME-based  plans  are  generated  without  the  assistance  of  the  planning  algo¬ 
rithm  and  are  referred  to  as  Human-Only  (HO)  plans.  By  executing  these  HO  plans 
within  the  DCAP  simulation  environment,  the  decision-making  strategies  of  the  users  are 
preserved  while  also  ensuring  that  these  plans  operate  under  the  same  environmental  con¬ 
straints  and  limitations  (due  to  the  simulation  environment)  as  the  HA-generated  plans. 

This  chapter  will  first  provide  an  overview  of  the  testing  protocol,  which  utilizes  a 
single  Expert  User  who  applies  a  set  of  SME  heuristics  to  guide  his  or  her  interactions 
with  both  the  HO  and  HA  planning  conditions.  This  section  will  also  provide  definitions 
for  the  scenarios  used  in  the  testing  program,  as  well  as  a  statistical  power  analysis  to  de¬ 
termine  the  number  of  required  trials.  The  final  sections  detail  the  measurement  metrics 
defined  for  the  system  and  the  testing  apparatus  utilized  in  this  experimental  program. 
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4.1. 


Testing  Protocol 


In  order  to  compare  the  performance  of  the  DCAP  system  to  the  human-generated, 
SME-based  plans,  a  series  of  realistic  test  scenarios  was  created.  As  one  major  purpose  of 
the  DCAP  system  is  to  replan  schedules  in  the  case  of  a  disruptive  failure  of  aircraft  or 
deck  resources,  each  scenario  included  at  least  one  fonn  of  failure.  Additionally,  as  test¬ 
ing  across  varying  complexity  levels  is  an  important  aspect  of  algorithm  validation  [24, 
25],  three  scenarios  addressing  different  levels  of  complexity  were  designed  and  are  dis¬ 
cussed  in  Chapter  4.1.2.  Applying  the  human-generated,  SME-based  (the  Human-Only, 
or  HO,  planning  condition)  planner  and  the  DCAP  planner  (the  Human- Algorithm,  or 
HA,  planning  condition)  to  these  scenarios  allows  for  a  relative  comparison  performance 
of  the  two,  but  provides  no  objective  comparison  point  to  ground  the  analysis.  A  third 
planning  condition  -  the  no-replan  Baseline  condition,  B  -  provides  this  perspective.  In 
this  planning  condition,  each  scenario  happens  as  scheduled  without  replanning.  This 
provides  an  objective,  independent  measuring  point  for  establishing  planner  performance 
and  internal  validity9  within  the  testing  scenarios.  In  the  case  of  the  latter,  there  is  no 
guarantee  that  the  Baseline  schedules,  as  designed,  are  near  optimal.  If  the  Baseline 
schedules  are  not  near  optimal,  the  possibility  exists  that  the  HO  and  HA  planners  may 
submit  schedules  that  outperfonn  the  Baseline  in  critical  metrics.  Measuring  all  three 
cases  allows  analysts  to  detennine  the  level  of  validity  of  the  Baseline  cases  as  designed, 
and  poor  results  may  lead  to  changes  in  the  testing  scenarios. 


9  An  experiment  exhibits  internal  validity  if  the  tests  performed  truly  measure  the  variables  of  interest 
and  the  results  cannot  be  produced  from  other  spurious,  uncontrolled  factors. 
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In  conducting  these  tests,  the  inclusion  of  multiple  users  -  even  if  all  are  guided  by 
the  same  SME  planning  strategies  -  causes  a  confound  in  the  examination  of  the  per¬ 
formance  of  the  planning  algorithm.  In  this  case,  it  becomes  difficult  to  analyze  the  per¬ 
formance  of  the  algorithm  on  its  own,  as  variations  in  user  input  and  strategies  may  di¬ 
rectly  cause  variations  in  algorithm  performance.  The  utilization  of  a  single  individual 
minimizes  the  variability  in  interaction  that  would  be  seen  with  a  large  group  of  human 
test  subjects  and  allows  for  a  more  precise  inspection  of  algorithm  performance  in  the 
DCAP  system.  Even  so,  a  single  individual’s  actions  may  vary  among  different  trials.  In 
order  to  remove  these  variations,  the  Expert  User’s  actions  were  scripted  and  based  upon 
a  defined  set  of  SME  Heuristics,  developed  from  interviews  with  Naval  personnel.  These 
are  detailed  in  the  following  section. 

4.1.1.  Subject  Matter  Expert  Heuristics 

Throughout  the  design  process  of  the  DCAP  system,  a  variety  of  Naval  personnel 
were  consulted.  This  included  over  two  dozen  individuals,  encompassing  fonner  Naval 
aviators,  a  fonner  member  of  an  Air  Wing  Commander’s  planning  staff,  and  two  com¬ 
manders  of  a  training  base  for  deck  crewmen.  In  meetings  that  occurred  in  person,  par¬ 
ticipants  were  presented  with  example  scenarios  that  could  occur  in  real-life  operations 
and  were  asked  what  their  responses  to  the  situations  would  be.  Through  these  guided 
interviews,  the  DCAP  research  team  was  able  to  identify  relative  consistency  in  solution 
generation  despite  a  lack  of  standardized  training  for  replanning  carrier  operations  [117]. 
These  rules,  or  heuristics,  are  shaped  by  human  experience  and  are  used  to  simplify  the 
problem  at  hand,  allowing  users  to  come  to  solutions  quickly  [13,  118,  119].  The  list  of 
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heuristics  appears  in  Table  6,  grouped  according  to  three  general  categories  (Generic, 
Deck,  and  Airborne)  but  not  in  order  of  importance. 

Table  6.  Aircraft  Carrier  expert  operator  heuristics. 


Deck 


(4)  Maintain  an  even 
distribution  of  workload 
across  the  deck 

(5)  Make  available  as  many 
deck  resources  as  possible 

(6)  When  moving  aircraft  on 
deck,  maintain  orderly 
traffic  flow  through  the 
center  of  the  deck 


Airborne 


(7)  Marshal  Stack  populated 
according  to  fuel  burn,  fuel 
level,  then  miscellaneous 
factors 

(8)  Park  aircraft  for 
maximum  availability  next 
cycle 

(9)  “True”  vs.  “Urgent” 
Marshal  Stack  emergencies 


General 


(1)  Minimize  Changes 

(2)  Cycle  aircraft  quickly, 
but  maintain  safety 

(3)  Halt  operations  if  crew  or 
pilot  safety  is  compromised 


General  heuristics  are  applied  to  any  and  all  replanning  scenarios.  These  General 
heuristics  include  -  minimize  changes  in  the  schedule  (Heuristic  1),  work  quickly  and 
safely  (Heuristic  2),  and  halt  operations  if  any  human  being  is  placed  in  immediate  physi¬ 
cal  danger  (Heuristic  3). 

For  Deck  heuristics,  the  concerns  are  to  balance  workload  on  deck  (Heuristic  4)  due 
to  concerns  of  crew  workload  and  the  maintainability  of  the  deck  equipment,  to  ensure 
maximum  flexibility  in  operations  by  keeping  all  resources  available,  if  possible  (Heuris¬ 
tic  5),  and  to  keep  orderly  motion  on  the  deck  by  focusing  movement  in  the  interior  of  the 
deck  (Heuristic  6). 

Airborne  heuristics  deal  with  the  ordering  of  aircraft  in  the  landing  order  (Heuristic 
7),  where  they  should  be  parked  after  landing  (Heuristic  8),  and  how  to  handle  failures 
for  airborne  aircraft  (Heuristic  9).  Applying  Heuristic  9  to  an  airborne  aircraft  requires 
understanding  the  nature  of  the  failure  and  its  criticality.  True  emergencies  must  be  dealt 

with  immediately,  as  they  endanger  the  pilot  and  the  aircraft.  Urgent  emergencies  are  of 
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concern,  but  if  compensating  for  these  failures  causes  further  schedule  degradation  or  re¬ 
quires  numerous  changes  on  deck,  operators  may  delay  action  until  a  more  satisfactory 
time. 

These  expert  heuristics  were  reviewed  by  the  previously-interviewed  Naval  personnel 
in  the  form  of  a  teach-back  interview  [120],  That  is,  the  interviewees  were  presented  with 
a  problem  scenario,  to  which  the  interviewer  applied  the  heuristic  in  question.  The  inter¬ 
viewer  would  describe  the  heuristics  and  what  their  resulting  plan  would  be.  The  inter¬ 
viewee  would  then  validate  proposed  action,  possibly  suggesting  further  details  or  a 
slight  differentiation  in  the  heuristic.  The  final  set  of  heuristics  thus  allows  a  non-expert 
user  to  generate  approximately  the  same  solutions  as  a  more  experienced  subject  matter 
expert. 

4.1.2.  Scenario  Definition 

Complexity  is  known  to  cause  variations  in  algorithm  performance,  due  in  part  to 
the  brittleness  inherent  to  automated  algorithms  [9].  As  such,  testing  across  a  range  of 
complexity  is  considered  a  necessity  for  the  full  validation  of  an  algorithm  [24,  25],  even 
if  this  only  establishes  bounds  on  algorithm  operation.  The  complexity  of  a  system  can  be 
described  either  objectively  (through  some  standardized,  logical  method)  [121-123]  or 
subjectively  (through  the  views  of  separate  individuals)  [124-126].  Scalability  [127],  par¬ 
ticularly  load  scalability,  can  also  be  used  as  a  form  of  complexity  for  the  system.  This 
involves  testing  over  a  range  of  load  sizes  (for  DCAP,  the  load  size  is  the  number  of  air¬ 
craft).  However,  due  to  physical  space  constraints,  the  aircraft  carrier  environment  has  a 
hard  upper  bound  on  the  number  of  aircraft  (as  well  as  crew  and  deck  support  vehicles) 
that  can  exist  at  any  given  time.  Subjective  evaluations  also  may  vary  widely;  therefore 
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an  objective  description  based  on  the  number  of  applied  SME  heuristics  was  used  in  or¬ 
der  to  provide  a  stable,  common  definition  of  complexity.  The  following  subsections  de¬ 
scribe  the  three  scenarios  defined  for  the  testing  process  (Simple,  Moderate,  Complex) 
and  list  the  applicable  heuristics  required  for  each  (a  total  of  4,  5,  and  7,  respectively).  A 
description  of  the  actions  taken  by  the  user  in  replanning  for  these  scenarios  appears  in 
Appendix  D. 

4. 1.2.1.  Simple  Scenario 

The  Simple  scenario  models  the  occurrence  of  a  catapult  failure  on  deck  during 
launch  operations  and  has  four  applicable  expert  user  heuristics,  detailed  below.  Twenty 
aircraft  (2  SMAC,  2  SUAV,  12  FMAC,  4  FUAV)  are  fueled  and  have  weapons  loaded 
while  parked  on  the  deck.  Aircraft  then  proceed  to  launch  catapults,  queuing  in  lines  of 
no  more  than  three  (similar  to  real  operations)  at  each  launch  catapult.  After  launching 
from  the  catapult,  aircraft  proceed  to  a  mission  area. 

Aircraft  launch  assignments  are  initially  distributed  across  catapults.  Catapult  1  re¬ 
mains  inaccessible  for  the  entirety  of  the  scenario  due  to  several  aircraft  parked  in  the 
immediate  vicinity.  Exact  times  are  not  predictable  due  to  the  stochasticity  in  processing 
times  of  fueling  and  launching  aircraft,  as  noted  earlier  in  Chapter  3.  While  estimates  of 
mean  time  and  standard  deviation  for  each  these  Gaussian  processing  times  can  be  ob¬ 
tained  and  summed  to  fonn  a  new  Gaussian  model,  additional  variability  and  stochastic¬ 
ity  exists  due  to  the  route  planning  system.  The  route  planner’s  actions  are  guided  by  the 
location  of  aircraft  at  each  point  and  cannot  be  adequately  modeled  as  a  Gaussian  distri¬ 
bution.  As  such,  the  processing  times  can  be  highly  variable  and  do  not  exhibit  a  standard 

distribution  form. 
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A  failure  occurs  in  the  system  555  seconds  after  simulation  start  (total  scenario  length 
approximately  1800  seconds),  occurring  in  the  window  of  time  after  the  launch  of  the 
SUAV  aircraft  from  Catapult  4,  but  before  the  launch  of  the  SMAC  aircraft  from  Cata¬ 
pult  3.  This  failure  incapacitates  Catapult  3  for  the  remainder  of  the  simulation.  Replan¬ 
ning  after  this  failure  should  address  the  reassignment  of  aircraft  to  the  remaining  acces¬ 
sible  and  operational  catapults.  The  scenario  tenninates  when  all  aircraft  have  departed 
the  carrier  deck  and  reached  their  mission  location. 

This  scenario  is  identified  as  Simple  as  replanning  for  the  system  requires  the  applica¬ 
tion  of  only  four  expert  user  heuristics.  Heuristics  1  (Minimize  changes  to  the  schedule) 
and  2  (Cycle  aircraft  quickly,  but  maintain  safety  at  all  times)  apply  to  most  situations. 
Additionally,  Deck  heuristics  4  (Maintain  an  even  distribution  of  workload  on  the  deck) 
and  6  (When  moving  aircraft  on  deck,  maintain  orderly  flow  through  the  center  of  the 
deck)  also  apply.  This  results  in  SMEs  moving  all  aircraft  from  the  failed  Catapult  3  for¬ 
ward  to  Catapult  2  while  also  attempting  to  balance  the  number  of  aircraft  at  the  two  re¬ 
maining  functional  catapults  (Catapults  2  and  4).  The  naive  user  action  might  simply 
move  aircraft  to  the  closest  catapult;  in  this  case,  aircraft  at  the  failed  Catapult  3  would  be 
sent  to  Catapult  4.  This  not  only  overloads  Catapult  4,  but  it  is  also  more  difficult  for  the 
crew  to  manage  the  turning  and  movement  of  the  aircraft  aft  than  to  taxi  forward.  In  the 
minds  of  the  SMEs,  moving  the  aircraft  forward  minimizes  the  complexity  of  and  risk 
associated  with  reassigning  aircraft  catapult  assignments  on  the  deck. 


71 


4. 1.2.2.  Moderate  Scenario 


The  Moderate  scenario  involves  the  application  of  five  expert  heuristics  and  models  a 
recovery  task.  In  this  scenario,  all  aircraft  begin  at  their  mission  location  and  immediately 
begin  returning  to  the  Marshal  Stack  to  land.  This  scenario  also  utilizes  twenty  aircraft  (2 
SMAC,  2  SUAV,  12  FMAC,  4  FUAV),  which  are  timed  to  enter  the  Marshal  Stack  with 
a  very  tight  spacing  based  on  Rule  7  in  the  expert  heuristics  (populate  the  Marshal  Stack 
according  to  fuel  burn  rate,  fuel  level,  and  maintenance  requirements).  FMAC  aircraft 
entered  first,  followed  by  SMAC  aircraft,  then  FUAV  aircraft,  then  SUAVs.  Two  failures 
are  introduced  just  before  aircraft  enter  the  Marshal  Stack  -  an  FMAC  has  a  hydraulic 
failure,  while  an  SMAC  has  a  fuel  leak.  Replanning  should  lead  to  a  reordering  of  air¬ 
craft  in  the  Marshal  Stack  that  ensures  both  aircraft  land  before  encountering  a  limit  vio¬ 
lation  on  their  hydraulic  fluid  and  fuel,  respectively.  Replanning  for  this  scenario  should 
also  address  Rules  1,  2,  3  (The  safety  of  pilots  and  crew  overrides  all,  even  if  it  requires 
stopping  operations  momentarily)  and  9  (Differentiate  between  “True”  emergencies, 
which  must  be  handled  immediately,  and  “Urgent”  emergencies,  which  could  be  delayed 
if  needed).  In  this  case,  the  SMEs  move  the  SMAC  (fuel  leak)  forward  in  the  Marshal 
Stack  to  minimize  the  risk  of  this  aircraft  running  out  of  fuel.  However,  the  nature  of  the 
hydraulic  failure  increases  the  possibility  of  the  FMAC  crashing  on  landing.  This  would 
disable  the  landing  strip  for  an  extended  period  of  time  while  crew  cleared  the  wreckage 
and  prepared  the  landing  strip  for  operation.  Moving  the  FMAC  backwards  in  the  Mar¬ 
shal  Stack  allows  for  additional  aircraft  to  land  and  thus  minimizes  the  potential  reper¬ 
cussions  of  a  crash,  if  one  occurs.  The  naive  user  may  not  understand  this  constraint  and 
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may  instead  send  both  failed  aircraft  forward  in  the  Marshal  Stack,  increasing  the  chance 
of  causing  major  disruptions  in  task  execution  for  the  remaining  aircraft. 

4, 1.2.3.  Complex  Scenario 

The  Complex  scenario  models  aspects  of  a  mixed  launch/landing  event  that  requires 
the  application  of  seven  expert  user  heuristics.  The  two  previous  test  scenarios  focused  on 
only  one  aspect  of  the  launch  and  landing  (recovery)  of  aircraft  in  the  aircraft  carrier  en¬ 
vironment.  The  Complex  scenario  focuses  on  both  aspects,  addressing  a  case  where 
emergency  launches  are  requested  in  the  midst  of  landing  operations.  This  scenario  be¬ 
gins  similarly  to  the  Moderate  scenario,  with  twenty  aircraft  (2  SMAC,  10  FMAC,  6 
FUAV)  returning  from  mission.  The  order  of  entry  is  slightly  different  from  that  of  the 
Moderate  scenario;  here,  FUAVs  enter  the  Marshal  Stack  first,  followed  by  FMACs  and 
SMACs.  In  the  midst  of  return  operations,  a  supervisor  requests  the  launch  of  additional 
reconnaissance  aircraft.  Also,  aircraft  begin  the  scenario  with  lower  fuel  levels  as  com¬ 
pared  to  the  Moderate  scenario,  which  greatly  increases  the  chances  of  encountering  low 
fuel  emergency  conditions  in  this  Complex  scenario. 

In  this  case,  two  additional  SUAVs  launch  from  the  flight  deck.  In  launching  these 
aircraft,  only  Catapults  2,  3,  and  4  are  available  (just  as  in  the  Simple  Scenario,  aircraft 
are  parked  over  Catapult  1,  making  it  inaccessible).  Just  as  this  request  is  fielded,  a  fuel 
leak  arises  in  a  SMAC  just  arriving  in  the  Marshal  Stack.  This  creates  conflicting  priori¬ 
ties  for  scheduling  -  the  Carrier  Air  Wing  Commander  (CAG)  has  requested  that  these 
aircraft  be  launched  immediately,  but  the  fuel  leak  must  also  be  addressed  relatively 
quickly.  However,  the  use  of  Catapults  3  and  4  may  lead  to  conflicts  with  aircraft  incom- 
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ing  to  land.  Addressing  this  scenario  will  require  the  application  of  heuristics  1,  2,  3,  4,  6, 
7,  and  9  (Table  11).  The  naive  user  solution  in  this  case  may  be  sending  one  aircraft  to 
separate  forward  and  aft  catapults  (e.g.,  to  Catapults  2  and  4).  While  this  may  guarantee  a 
faster  launch  time,  any  potential  delays  in  the  launch  at  the  aft  catapult  increase  the  like¬ 
lihood  of  an  aircraft  on  approach  being  forced  to  abandon  landing.  This  incurs  greater 
fuel  cost  and  increases  the  risk  associated  with  this  aircraft.  The  SME  solution  requires 
two  actions  -  moving  the  failed  SMAC  forward  in  the  landing  order  (to  minimize  the 
chance  of  running  out  of  fuel)  and  sending  the  launching  aircraft  to  the  forward  catapult 
(Catapult  2)  only.  Utilizing  only  this  catapult  ensures  that,  regardless  if  the  time  required 
to  launch,  aircraft  on  approach  do  not  experience  any  interference  in  the  landing  strip 
area.  In  this  case,  efficiency  in  launching  is  sacrificed  to  minimize  the  risk  for  the  air¬ 
borne  aircraft. 


4.1.3.  Statistical  Power  Analysis 

Power  tests  performed  for  the  DCAP  system  resulted  in  a  sample  size  of  30  trials  per 
planning  condition  and  scenario  combination.  This  resulted  in  a  total  of  270  required  tri¬ 
als  (30  trials  x  3  scenarios  x  3  planning  conditions),  with  two-thirds  of  these  requiring 
direct  intervention  by  a  human  operator.  The  HO  planning  condition  requires  an  individ¬ 
ual  to  apply  the  SME  planning  heuristics  to  the  scenario,  while  the  HA  planning  condi¬ 
tions  requires  an  individual  to  interact  with  the  DCAP  planning  algorithm  to  affect  a  re¬ 
plan  in  the  system.  The  final  third  (the  Baseline  plan)  represents  a  nominal  schedule  with 
no  failures  or  need  to  replan. 
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4.2. 


Definition  of  the  Metrics 


The  five  metric  classes  developed  by  Pina  et  al.  [26,  27]  provide  the  framework  for 
defining  metrics  to  measure  the  performance  of  the  DCAP  system.  These  originally  ad¬ 
dressed  measures  of  Mission  Efficiency,  Algorithm  Behavior  Efficiency,  Human  Behav¬ 
ior  Efficiency,  Human  Behavior  Precursors,  and  Collaborative  Metrics.  As  the  examina¬ 
tion  of  the  multiple  user  case  is  outside  of  the  scope  of  this  research,  several  metric  sub¬ 
classes  were  removed.  Metrics  still  exist  for  four  of  the  five  classes  defined  by  Pina  et  al. 
(only  Collaborative  metrics  are  removed  entirely,  due  to  the  use  of  only  a  single  Expert 
User  who  is  intimately  familiar  with  the  system).  Detailed  definitions  of  these  metrics 
will  be  discussed  in  the  following  subsections. 


4.2.1.  Mission  Efficiency  Metrics 

Pina  et  al.  divided  mission  performance  metrics  into  three  categories  of  time-based, 
error-based,  and  coverage-based  metrics.  For  the  aircraft  carrier  environment,  these  re¬ 
spectively  address  the  Expediency  with  which  operations  occur,  the  level  of  Safety  under 
which  they  occur,  and  the  Efficiency  of  task  performance.  The  full  list  of  Mission  Effi¬ 
ciency  measures  appears  in  Table  7  and  is  discussed  in  the  subsequent  subsections. 


Table  7.  DCAP  Mission  Performance  Metrics. 


Safety 

(Error-based) 

Expediency 

(Time-based) 

Efficiency 
(Coverage-based ) 

•  Number  of  limit  violations 

o  Fuel 
o  Time 

•  Pilot 

■  Crew 

•  Foul  Deck  Time  for  the  Landing  Strip 

•  Time  to  recover  emergency  aircraft 

•  Safety  Margin  on  landing 

o  Time  to  fuel  threshold  (e-fuel  or  zero  fuel) 

•  Total  Time  on  Deck 

o  Aircraft  in  transit 
o  Crew 

o  UGVs 

•  Mission  Duration 

•  Delays 

o  WTCrew 
o  WTQ-Catapult 
o  WTQ-MS 

•  Excess  capacity/ 
overload  measurement 

•  Resource  Usage  Rates 
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4.2. 1.1.  Safety  (Error-based)  Metrics 


The  Safety  category  includes  measures  tracking  various  possible  failure  and  error 
conditions  within  the  system.  There  are  two  sources  of  explicit  failures  in  the  system  - 
aircraft-specific  failures  and  deck  resource  failures.  Failures  for  the  deck  concern  the 
failure  of  specific  items,  such  as  Catapult  1 .  Aircraft  failures  occur  for  one  or  more  air¬ 
craft  in  the  system,  currently  modeled  as  some  fonn  of  fluid  leak.  Table  8  includes  a  list 
of  possible  failures  currently  modeled  in  the  system. 


Table  8.  Possible  Failures  in  the  DCAP  Simulation 


Failure 

Catapult  Failure 

Aircraft 

Fuel  Leak 

Aircraft 
Hydraulic  Leak 

Entities  Affected 

Deck  aircraft  assigned 
to  that  catapult 

Individual  aircraft 

Individual  aircraft 

Duration 

Few  minutes  to 
permanent 

Until  repaired  or  fuel 
reaches  zero 

Until  repaired  or 
hydraulic  fluid  level 
reaches  zero 

Solution  requires... 

New  catapult 
assignments  for  affected 
aircraft 

Lading  the  aircraft 
before  fuel  level 
becomes  critical 
(<20%) 

Landing  the  aircraft 
before  hydraulic  fluid 
level  becomes  critical 
(<20%) 

Because  fuel  and  hydraulic  fluid  are  finite  resources,  the  occurrence  of  a  leak  requires 
that  action  be  taken  to  land  the  affected  airborne  aircraft  before  these  fluid  levels  reach  a 
critical  state.  Such  aircraft  are  labeled  emergency  aircraft.  These  critical  states  are  defined 
as  a  20%  of  the  maximum  fluid  level.  Breaching  either  of  these  thresholds  (fuel  or  hy¬ 
draulic  fluid)  is  tenned  a  limit  violation.  While  the  planner  cannot  control  the  occurrence 
of  these  failures,  the  subsequent  schedule  correction  should  minimize  the  occurrence  of 
limit  violations  (a  hard  constraint  on  the  planner). 
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In  establishing  the  performance  of  planning  corrections  for  aircraft  experiencing  fail¬ 
ures,  three  values  can  be  calculated.  The  first  is  the  difference  between  the  time  of  land¬ 
ing  and  the  time  of  the  failure  -  the  Emergency  Aircraft  Recovery  Time  (EART),  which 
should  be  minimized.  Additionally,  the  remaining  fuel/hydraulic  fluid  levels  and  the  total 
flight  time  remaining  may  also  be  calculated.  The  latter  values  serve  as  diagnostics  of  the 
relative  robustness  of  the  new  plan,  depicting  how  much  buffer  time  was  afforded  by  the 
solution.  These  metrics  should  be  statistically  correlated,  in  that  minimizing  EART 
should  maximize  the  level  of  fuel/hydraulic  fluid  remaining  at  landing.  The  remaining 
fluid  level  can  also  be  used  to  detennine  the  remaining  excess  flight  time,  describing  the 
amount  of  time  the  aircraft  could  have  spent  in  flight  before  a  new  schedule  was  required. 
This  third  value  is  likely  also  statistically  correlated  to  the  first  two.  While  a  single  value 
could  suffice  for  statistical  analysis,  the  inclusion  of  ah  three  provides  additional  diagnos¬ 
tic  value  for  the  system. 

An  additional,  non-explicit  error  condition  also  exists  in  the  D-CAP  simulation.  At 
certain  times,  crew  or  aircraft  may  move  into  the  landing  zone  (LZ)  during  operations, 
which  results  in  a  fouled  deck  condition  for  the  landing  strip.  In  this  state,  no  aircraft  may 
land.  Higher  values  of  LZ  Foul  Time  result  in  increased  likelihood  of  an  aircraft  being 
“waved  off’  and  forced  to  return  to  a  holding  pattern.  If  this  occurs  while  an  aircraft  ex¬ 
periencing  a  fuel  or  hydraulic  leak  is  attempting  to  land,  the  potential  for  losing  the  air¬ 
craft  and  pilot  increases  significantly.  Thus,  while  this  Foul  Time  is  not  a  direct  failure, 
higher  values  induce  greater  probabilities  of  failures  into  the  system.  This  value  should  be 
minimized  for  recovery  scenarios. 
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The  focus  of  this  section  has  been  on  error-related  metrics  addressing  safety,  noted  as 
a  priority  by  stakeholders  interviewed  during  this  research.  This  overriding  priority  led  to 
the  classification  of  several  metrics  as  error-based  metrics,  even  though  they  are  time- 
based,  because  of  their  overriding  safety  value.  There  are  several  additional  time  metrics 
used  in  this  study,  used  primarily  as  diagnostic  measures  of  efficiency.  These  are  dis¬ 
cussed  in  the  following  section. 

4.2. 1.2.  Expediency  (Time-based)  Metrics 

Time-based  measures  for  the  DCAP  system  address  the  expediency  with  which  op¬ 
erations  are  perfonned.  Within  the  aircraft  carrier  operations  environment,  minimizing 
the  time  required  to  perform  actions  minimizes  total  aircraft  fuel  consumption  while  also 
minimizing  risks  to  the  crew,  aircraft,  and  ground  vehicles  active  in  the  system10. 

As  a  whole,  measures  for  expediency  have  been  previously  used  as  measures  of  over¬ 
all  mission  performance  and  as  diagnostic  measures  for  subcomponents  and  subtasks  in 
the  system.  For  the  DCAP  system,  both  forms  are  utilized.  The  overall  Mission  Duration 
is  calculated  as  a  measure  of  overall  perfonnance.  It  is  defined  as  the  elapsed  time  from 
the  start  of  the  simulation  to  the  point  that  a  tenninal  end  condition  (based  on  the  scenario 
definition)  is  reached.  Ideally,  the  system  should  execute  the  schedule  in  a  minimum 
amount  of  time,  launching  aircraft  as  quickly  as  possible  from  the  deck,  or  maximizing 
the  rate  at  which  airborne  aircraft  are  allowed  to  land. 


10  It  is  the  belief  of  the  Naval  personnel  interviewed  in  this  research  that  decreasing  the  active  time  of 
aircraft,  crew,  and  support  vehicles  decreases  the  cumulative  probability  of  that  entity  experiencing  an  ac¬ 
cident. 
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For  the  expediency  measures,  “Active  time”  is  defined  as  the  total  amount  of  time 
any  person  or  vehicle  is  actively  engaged  in  a  task  on  the  carrier  deck.  For  example,  an 
aircraft  accumulates  active  time  while  it  is  fueling  or  is  taking  off,  but  not  while  it  is 
parked  or  otherwise  idle.  These  values  can  be  calculated  for  individual  aircraft,  crew,  or 
deck  support  vehicles,  as  well  as  summed  for  the  entirety  of  each  group.  For  crew,  lower 
active  time  values  equate  to  a  lower  likelihood  of  injury  and  a  lower  level  of  fatigue.  The 
same  is  also  true  for  ground  vehicles,  although  injuries  and  fatigue  are  replaced  by  main¬ 
tenance  issues  and  fuel  constraints,  respectively.  For  aircraft,  lower  active  times  imply 
lower  fatigue  for  the  pilots  and  less  fuel  consumption,  as  well  as  a  lowered  risk  of  possi¬ 
ble  collisions.  Aircraft  are  also  given  measures  of  Taxi  Time,  denoting  the  amount  of 
time  aircraft  are  engaged  in  taxi  tasks,  including  time  spent  waiting  for  crew  or  clearance 
to  move.  This  is  a  subset  of  the  Active  Time  for  aircraft  and  is  included  as  a  diagnostic 
measure  to  detennine  how  well  the  system  has  allocated  movement  tasks  on  the  deck.  For 
all  of  these  measures  (individual  or  collective),  lower  values  are  desirable. 

Additionally,  three  metrics  measuring  system  delays  are  included,  influenced  by 
Cummings  and  Mitchell’s  wait  times  for  human  interaction  [35]-  Wait  Time  in  Queue  at 
Catapult  (WTQC),  Wait  Time  in  Queue  for  Crew  (WTQCrew),  and  Wait  Time  in  Queue 
for  Marshal  Stack  (WTQMS).  These  track  the  total  wait  time  aircraft  incur  while  waiting 
in  the  processing  queue  at  a  catapult,  waiting  for  a  crewmember  to  arrive  and  perform  a 
task,  or  waiting  in  the  marshal  queue  for  landing  clearance.  Higher  values  of  WTQC  and 
WTQCrew  imply  that  aircraft  are  actively  burning  fuel  while  waiting  for  another  aircraft 
or  crewmember  to  complete  a  task.  Ideally,  aircraft  would  have  only  minimum  wait 
times,  saving  fuel.  Lower  values  of  WTQMS  are  also  desirable,  as  this  value  depicts  the 
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total  time  aircraft  are  in  holding  patterns  waiting  to  land,  consuming  limited  fuel  re¬ 
sources.  If  these  times  are  too  high,  pilots  and  aircraft  are  in  danger  of  having  insufficient 
fuel  to  land  on  the  carrier  deck. 

The  time  measurements  covered  in  this  section  function  primarily  as  diagnostic 
measures  of  the  efficiency  of  the  holistic  system.  Additional  diagnostic  measures  can  also 
be  incorporated  for  the  deck  resources,  specifically  for  the  deck  catapults.  These  meas¬ 
ures  of  coverage  establish  how  well  tasks  were  allocated  between  the  four  catapults  dur¬ 
ing  the  course  of  a  simulation  and  are  covered  in  the  following  section. 

4.2. 1.3.  Efficiency  (Coverage-based)  Metrics 

Efficiency  measures  defined  for  this  category  address  the  distribution  of  tasks  in  the 
system,  measuring  the  number  of  launches  at  each  catapult  as  well  as  the  launch  rate 
(launches  per  mission  duration).  Due  to  the  nature  of  the  deck  environment,  it  is  desirable 
to  have  a  balanced  distribution  of  launch  tasks  between  catapults.  Launches  cannot  occur 
simultaneously  for  catapults  within  each  of  the  forward  and  aft  pairs  (within  Catapults  1 
and  2  or  within  Catapults  3  and  4,  respectively).  However,  launches  can  occur  simultane¬ 
ously  across  pairs  (i.e.,  Catapult  3  may  launch  while  either  of  Catapult  1  or  2  is  launch¬ 
ing).  This  implies  that  distributing  tasks  across  the  catapult  pairs  may  increase  launch  ef¬ 
ficiency.  Additionally,  an  even  distribution  of  launch  tasks  within  a  pair  of  catapults  also 
creates  a  slight  perfonnance  gain.  If  two  aircraft  are  assigned  to  launch  at  neighboring 
catapults,  the  first  aircraft  to  arrive  will  immediately  begin  launch  operations.  The  second 
aircraft  is  allowed  to  taxi  onto  the  neighboring  catapult,  saving  some  time  in  the  takeoff 
process,  even  though  it  must  wait  to  begin  launch  preparations.  Due  to  these  characteris¬ 


tics,  it  is  desirable  to  balance  launch  tasks  between  the  fore  and  aft  catapult  pairs  as  well 
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as  among  catapults  within  a  single  pair.  This  assignment  strategy  should  also  maximize 
the  overall  launch  rate  of  the  system.  Additionally,  these  measures  of  launch  rate  could 
also  be  applied  according  to  queuing  theory  and  establish  a  theoretical  maximum  for 
launch  capabilities. 

Combined,  these  measures  of  Error-,  Time-,  and  Coverage-based  efficiency  serve  to 
provide  descriptive  evidence  to  quantify  the  differences  in  performance  between  planning 
conditions  and  diagnostic  support  necessary  for  detennining  the  mechanisms  that  created 
these  differences.  However,  these  measures  only  apply  to  mission  tasks,  and  examine  the 
performance  of  the  solution  as  it  is  executed.  Additional  measures  are  needed  to  establish 
the  effectiveness  of  the  human  operator  and  the  algorithm  in  the  schedule  creation  proc¬ 
ess.  Measure  for  the  algorithm  fall  into  the  class  of  Autonomous  Platfonn  Behavior  Effi¬ 
ciency  metrics,  which  are  discussed  in  the  next  section. 

4.2.2.  Autonomous  Platform  Behavior  Efficiency  Metrics 

Measures  of  Automation  Behavior  Efficiency  address  how  well  the  algorithm  sup¬ 
ports  system  operations  and  includes  subcategories  of  Usability,  Adequacy,  Autonomy, 
and  Self-Awareness  [26,  27].  Usability  concerns  the  interaction  of  the  algorithm  and  the 
human  operator  through  the  system  interfaces.  Adequacy  addresses  the  computational 
efficiency  of  the  algorithm,  including  speed  of  computation  and  error  rates.  Autonomy 
concerns  how  well  the  system  works  while  not  experiencing  direct  human  interaction. 
Self-awareness  addresses  the  capability  of  the  algorithm  to  examine  (and  possibly  cor¬ 
rect)  its  own  performance.  The  metrics  defined  for  these  classes  are  listed  in  Table  9  and 
explained  in  the  remainder  of  this  section.  For  the  purposes  of  this  system  evaluation, 


only  Autonomy  and  Adequacy  are  addressed,  as  the  remaining  two  classes  (Self- 
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Awareness  and  Usability)  are  not  applicable.  The  algorithms  currently  in  use  in  the 
DCAP  system  are  not  self-aware,  thus  this  class  of  measures  cannot  be  used  in  testing. 
Usability  measures  are  best  applied  when  using  a  varied  user  population;  the  use  of  the 
single  Expert  User  would  not  give  adequate  information  forjudging  usability,  thus  negat¬ 
ing  the  use  of  this  class.  However,  future  evaluations  of  system  usability  are  planned. 


Table  9.  DCAP  Automation  Behavior  Efficiency  Metrics  [26,  27]. 


•  Usability  survey 
o  Learnability 
o  User  satisfaction 


•  Not  in  this  test  format 


Adequacy 


•  Reliability 

o  Number  of  errors 

■  Infeasible  Schedules 

■  Incomplete  schedules 

•  Performance  Benchmarks 

o  Processing  time 
o  Required  memory 


Autonomy 


•  Number  of  near-misses 
o  Halo  Violations 
o  Halo  Violation 
Durations 


Many  potential  measures  of  algorithm  Adequacy  are  interdependent  with  other  meas¬ 
ures  since  the  quality  of  the  returned  solution  is  not  solely  dependent  on  the  algorithm. 
The  priority  and  constraint  inputs  by  the  operator  affect  the  final  solutions  generated  by 
the  algorithm,  such  that  neither  the  operator  nor  the  algorithm  is  solely  responsible  for  the 
resulting  schedule.  However,  several  measures  of  algorithm  adequacy  can  be  developed. 
The  number  of  failures  in  the  algorithm  should  be  tracked  in  order  to  establish  the  reli¬ 
ability  and  stability  of  the  algorithm.  Processing  time  of  the  algorithm  (Wait  Time  due  to 
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algorithm  Processing,  WTP)  is  included,  as  it  is  often  used  in  analyses  of  algorithm  per¬ 
formance  [24,  25]. 

Measures  of  Autonomy  include  metrics  that  address  the  efficiency  of  the  embedded 
vehicle  router  and  track  the  occurrence  and  duration  of  proximity  violations  that  may  oc¬ 
cur.  Figure  16  provides  a  depiction  of  the  collision  avoidance  “halo”  that  surrounds  each 
aircraft.  The  system  tracks  the  number  of  times  the  halo  is  violated  and  the  duration  of 
time  this  violation  exists.  The  actual  diameter  of  the  halo  is  forty-four  feet,  equal  to  the 
wingspan  of  an  F-18,  the  most  common  aircraft  in  the  Naval  fleet.  Because  the  system 
treats  aircraft  as  individual  point  masses  (physical  constraints  are  not  currently  modeled), 
this  measurement  was  adapted  to  the  other  aircraft  as  well.  In  Figure  16,  the  crewmem¬ 
ber’s  (blue  dot)  act  of  crossing  the  halo  violation  line  would  increase  the  violation  count 
by  one.  As  long  as  this  crewman  is  within  the  halo  area,  time  is  added  to  the  Halo  Viola¬ 
tion  Duration  (HV-D)  measure.  Time  is  no  longer  added  to  the  duration  measure  once  the 
crewmember  exits  the  halo  area.  However,  if  the  crewmember  reenters  the  halo  area,  the 
count  is  again  increased  by  one  and  times  resumes  being  added  to  the  duration  measure. 
The  addition  of  time  to  the  duration  measure  is  agnostic  of  the  number  of  crewmembers 
within  the  halo  and  is  sensitive  only  to  the  presence  of  a  violation. 


Figure  16.  Collision  avoidance  "halo"  for  aircraft.  Blue  dot  represents  crew,  gold  dot  with 
black  ring  signifies  environment  model  of  aircraft  (point  mass). 
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4.2.3.  Human  Behavior  Efficiency  Metrics 


Human  Behavior  Efficiency  metrics  consider  the  physical  and  cognitive  aspects  of 
user  interaction  with  the  system  displays  and  controls.  Pina  et  al.  [26,  27]  divided  these 
into  two  subcategories  for  Attention  Allocation  Efficiency  (AAE,  addressing  cognitive 
aspects)  and  Information  Processing  Efficiency  (IPE,  addressing  physical  aspects).  The 
list  of  Human  Behavior  Efficiency  metrics  applied  to  DCAP  appears  in  Table  10. 

AAE  measurements  seek  to  define  how  the  user’s  attentional  resources  are  allocated 
during  system  use.  However,  because  the  testing  program  is  utilizing  a  single  Expert 
User,  these  measures  are  not  applicable.  Future  testing  programs  utilizing  a  variety  of 
human  users  should  incorporate  these  measures.  IPE  metrics,  however,  may  still  be  ap¬ 
plied  to  the  system  and  measure  the  efficiency  with  which  the  operator  inputs  commands 
into  the  system.  In  this  case,  the  actions  of  the  Expert  User,  who  is  intimately  familiar 
with  the  system,  can  be  treated  as  an  upper  bound  for  future  users.  Pina  et  al.  [26,  27]  dif¬ 
ferentiated  the  IPE  subclass  into  measures  of  Recognition,  Decision,  Action  Implementa¬ 
tion,  and  Task  Efficiencies.  The  low  number  of  decisions  that  are  made  in  the  DCAP  sys¬ 
tem,  as  well  as  the  use  of  a  single  Expert  User,  negate  the  first  two  subcategories  of  this 
class.  The  remaining  two  classes  -  Action  Implementation  and  Task  Efficiency  -  address 
the  physical  interaction  of  the  user  with  the  system,  establishing  how  effectively  the  user 
translates  decisions  into  actions.  For  DCAP,  these  metrics  were  defined  to  include  the 
number  of  user  interactions  (mouse  clicks  or  button  presses)  during  replanning,  the  dis¬ 
tance  of  mouse  cursor  travel,  and  the  total  time  of  interaction  with  the  system.  For  in¬ 
stance,  a  user  that  is  panicked  due  to  increased  time  pressure  may  begin  navigating  incor¬ 
rect  measures  in  a  rush  to  complete  their  tasks.  Doing  so  will  result  not  only  in  an  in- 
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creased  time  of  activity,  but  backtracking  through  incorrect  menus  will  also  result  in 
more  mouse  clicks  in  the  interface  and  more  cursor  movement  around  the  screen.  Con¬ 
versely,  a  user  who  is  carefully  deliberating  their  actions  may  also  exhibit  increased  ac¬ 
tivity  time,  but  will  likely  not  exhibit  increases  in  mouse  clicks  and  cursor  movement. 
Ideally,  the  user  would  make  a  decision  in  minimal  time  and  with  minimal  actions 
(mouse  clicks)  and  cursor  movements. 

Table  10.  DCAP  Human  Behavior  Efficiency  Metrics  [26,  27] 


Information  Processing  Efficiency 


•  Action  Implementation  Efficiency 
o  Number  of  interactions 
o  Distance  mouse  cursor  travels 

•  Task  Efficiency 
o  Interaction  Time 

The  measures  in  this  section  have  described  the  active  performance  of  the  human  op¬ 
erator,  but  have  not  characterized  how  the  physical  environment  and  the  operator’s  men¬ 
tal  state  affect  this  behavior.  These  measures  of  Human  Behavior  Precursors  are  found  in 
the  following  section. 

4.2.4.  Human  Behavior  Precursors 

Human  Behavior  Precursors  include  the  psychological  and  physical  states  that  affect 
operator  performance.  Pina  et  al.  [26,  27]  divided  cognitive  measures  into  those  that  ad¬ 
dress  operator  workload,  situational  awareness,  and  self-confidence.  While  a  variety  of 
measures  of  workload  can  be  used  [3,  39,  91-94],  utilization  [89,  95,  128,  129]  is  a  direct, 
quantitative  measurement  of  the  time  the  user  interacts  with  the  system.  Measures  of 
utilization  require  the  knowledge  of  total  user  interaction  time  with  the  system  and  total 


•  Not  in  this  test  format 
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time  of  system  execution.  Utilization  is  then  a  measure  of  the  percentage  of  total  operat¬ 
ing  time  in  which  the  user  is  actively  engaging  the  system.  Higher  utilization  rates  often 
result  in  increased  mental  fatigue  on  the  part  of  the  human  operator,  increasing  the  likeli¬ 
hood  of  errors  in  decision-making  or  task  execution.  This  measure  primarily  involves 
user  physical  interaction  with  the  system  (e.g.,  the  IPE  measures),  as  detecting  cognitive 
interaction  is  notoriously  difficult. 

Table  11.  Human  Behavior  Precursor  Metrics 
(*  requires  naive  users). 


Workload 


•  Utilization 


•  NASA  TLX/ 
Post  test 
questionnaire* 


As  with  metrics  for  Human  Behavior  Efficiency,  the  use  of  an  Expert  User  negates 
certain  measurement  subclasses,  namely  Situational  Awareness  and  Self-confidence. 
However,  measures  from  these  two  classes  should  be  included  in  future  research  pro¬ 
grams  that  include  a  varied  group  of  human  users. 

4.2.5.  Section  Summary 

This  section  has  discussed  the  creation  of  metrics  for  the  analysis  of  the  DCAP  sys¬ 
tem  across  four  major  categories  of  Mission  Efficiency,  Algorithm  Behavior  Efficiency, 
Human  Behavior  Efficiency,  and  Human  Behavior  Precursors.  Measures  for  Mission  Ef¬ 
ficiency  address  the  overall  perfonnance  of  the  system  as  it  develops  and  implements 
schedules  for  the  aircraft  carrier  environment.  The  remaining  measures  address  the  effec¬ 
tiveness  of  the  human  operator  and  the  algorithm  in  supporting  this  process,  as  well  as 
addressing  the  effectiveness  of  their  interactions  with  each  other.  These  measures  are  the 
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primary  mechanisms  with  which  the  perfonnance  of  the  DCAP  system  will  be  compared 
to  the  human-generated  SME  plans  over  a  series  of  testing  scenarios.  These  scenarios  are 
based  on  realistic  deck  conditions,  representing  three  different  levels  of  complexity 
within  the  environment.  The  following  section  will  detail  the  creation  of  the  schedules 
and  the  determination  of  relative  complexity  for  each. 

4.3.  Testing  Apparatus 

Testing  was  performed  on  a  Lenovo  Thinkpad  W500  laptop  (2.80  GHz  Intel  Core  2 
Duo  T9600  CPU,  8  GB  RAM,  Windows  7  64-bit  operating  system)  using  a  Logitech 
M510  wireless  RF  mouse.  The  DCAP  simulation  was  run  in  an  Ubuntu  9.10  virtual  ma¬ 
chine  run  through  VMWare  Workstation.  Within  Ubuntu,  the  DCAP  software  (a  Java™ 
application)  was  executed  through  the  Eclipse  Galileo  Java™  IDE.  Data  were  extracted 
by  automated  features  embedded  in  the  Java  code.  Events  were  logged  at  the  time  of  each 
failure,  at  the  time  of  replan  completion,  and  upon  scenario  termination.  Scenario  termi¬ 
nation  was  also  automated  to  ensure  no  variation  in  end  conditions.  Data  files  were  re¬ 
formatted  into  Excel™  spreadsheets,  then  analyzed  in  the  SPSS™  analytical  software 
package.  The  results  of  this  data  analysis  will  be  covered  in  the  following  chapter. 

4.4.  Chapter  Summary 

This  chapter  has  discussed  the  perfonnance  validation  testing  of  the  DCAP  system, 
which  focuses  on  examining  the  performance  of  the  system  in  replanning  tasks  in  the  air¬ 
craft  carrier  environment.  This  chapter  began  with  a  discussion  of  the  testing  protocol 
defined  for  this  program,  including  a  description  of  the  Expert  User  used  to  minimize 
variations  in  human  input  into  the  system.  A  first  subsection  explained  the  set  of  Subject 
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Matter  Expert  heuristics  that  would  be  used  by  this  Expert  User  in  replanning,  followed 
by  a  subsection  defining  the  three  testing  scenarios  used  in  system  testing,  whose  com¬ 
plexities  were  based  on  the  number  of  SME  heuristics  required  in  replanning.  A  third 
subsection  discussed  the  statistical  power  analysis  that  detennined  the  number  of  requir¬ 
ing  trials  for  the  testing  program.  The  second  main  section  of  this  chapter  defined  the 
metrics  used  in  establishing  the  performance  of  the  three  planning  conditions,  with  indi¬ 
vidual  subsections  addressing  the  main  categories  of  the  metric  hierarchy.  This  chapter’s 
final  section  discussed  the  testing  apparatus  used  in  this  experimental  program. 
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5.  Results  and  Discussion 


This  chapter  provides  both  quantitative  and  qualitative  analysis  of  the  testing  results 
for  the  no-replan  Baseline  (B),  Human-Only  (HO),  and  Human-Algorithm  (HA)  planning 
conditions.  For  these  three  planning  conditions  and  the  three  test  scenarios  (Simple, 
Moderate,  and  Complex),  a  total  of  nine  different  data  sets  were  generated  (gray  blocks  in 
Figure  17).  This  chapter  compares  the  performance  of  the  planners  within  each  scenario 
(rows  in  Figure  17)  as  well  as  how  planner  performance  varies  across  scenario  levels 
(columns  in  Figure  17). 
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Figure  17.  Visual  depiction  of  the  analyses  performed  in  this  testing  program. 


In  total,  thirty-nine  different  measurement  metrics  were  defined  for  the  DCAP  testing 
program  in  Chapter  4.2.  In  using  three  planning  conditions  (B,  HO,  and  HA)  across  three 
scenarios,  a  total  of  324  possible  pairwise  comparisons  exist.  Performing  all  of  these  tests 
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would  significantly  increase  the  experiment-wise  error  rate  -  the  chance  that  the  experi¬ 
ment  as  a  whole  accepts  one  alternative  hypothesis  that  should  be  rejected.  This  risk  can 
be  mitigated  by  reducing  the  number  of  individual  tests  along  with  lowering  the  value  of 
the  significance  level  a  used  in  each.  An  overview  of  these  adjustments  is  discussed  in 
the  first  section  in  this  chapter,  with  the  remaining  sections  presenting  and  discussing  the 
results  of  data  collection. 

5.1.  Reducing  Family-wise  Error 

In  performing  statistical  testing  on  groups  of  data,  tests  can  be  divided  into  “families” 
of  tests  applied  to  subsets  of  independent  data.  For  instance,  within  the  DCAP  test  pro¬ 
gram,  the  data  from  each  of  the  three  test  scenarios  forms  its  own  subset  -  tests  applied  to 
the  Moderate  scenario  have  no  relation  to  the  tests  performed  on  data  from  the  Simple 
scenario.  In  doing  so,  the  family-wise  error  rate,  afW,  is  formally  defined  as  the  “probabil¬ 
ity  of  making  at  least  one  Type  I  error  in  the  family  of  tests  when  all  the  null  hypotheses 
are  true”  [130].  As  the  number  of  statistical  tests  applied  to  the  experimental  data  in¬ 
creases,  the  likelihood  that  at  least  one  test  in  the  family  experiences  Type  I  Error  (ac¬ 
cepting  the  alternative  hypothesis  when  it  should  be  rejected)  also  increases.  For  exam¬ 
ple,  utilizing  a  significance  level  a  of  0.05  on  a  single  family  of  statistical  tests  (each  test 
has  only  a  5%  chance  of  Type  I  error)  does  not  imply  that  the  chance  of  any  test  in  this 
family  experiencing  Type  I  error  is  0.05.  For  a  study  with  five  statistical  tests  at  an  a  of 
0.05,  the  likelihood  of  at  least  one  test  experiencing  Type  I  Error  (aew)  is  0.23;  for  10 
tests,  0.40;  for  50  tests,  0.92  [131].  Decreasing  the  number  of  statistical  tests  perfonned 
or  lowering  the  significance  level  a  for  the  remaining  tests,  or  a  combination  of  both, 
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In  order  to  decrease  the  chance  of  family-wise  error  in  each  of  the  three  test  scenarios, 
two  filters  were  passed  over  the  data  in  order  to  reduce  the  number  of  statistical  tests  be¬ 
ing  performed.  In  the  first  filter,  metrics  that  were  considered  to  have  low  external  valid¬ 
ity11  were  removed  from  consideration.  For  instance,  Wait  Time  in  Queue  due  to  Crew 
(WTQCrew)  is  useful  for  determining  the  efficiency  of  crew  allocation,  but  the  HA  plan¬ 
ner  does  not  plan  tasks  for  the  crew  at  this  time.  Crewmembers  are  assigned  automati¬ 
cally  in  the  simulation,  based  on  their  availability.  Additionally,  stakeholders  in  the  air¬ 
craft  carrier  environment  are  less  concerned  with  these  measures  than  they  are  for  other 
metrics,  such  as  Total  Aircraft  Taxi  Time  (TATT).  A  list  of  the  metrics  removed  and  the 
reasons  for  doing  so  appears  in  Appendix  E. 

In  the  second  filter,  a  Principal  Components  Analysis  (PCA)  was  performed  for  each 
planner-scenario  combination,  yielding  nine  correlation  matrices  which  identity  statisti¬ 
cally  related  data  within  a  given  planner-scenario  combination.  These  matrices  were  ana¬ 
lyzed  within  scenarios  (across  rows  in  Figure  17)  in  order  to  identify  groups  of  cross- 
correlated  metrics  (sets  of  metrics  that  returned  Pearson  correlations  of  0.7  or  greater  at  a 
p-value  of  less  than  0.001).  For  each  group  of  metrics  that  correlated  across  planning 
conditions,  a  single  metric  was  chosen  for  further  analysis  based  on  high  external  valid¬ 
ity.  For  instance,  PCA  data  for  the  Moderate  scenario  revealed  that  for  all  three  planning 
conditions,  Total  Aircraft  Active  Time  (TAAT),  Total  Active  Time  (TAT),  and  Wait 
Time  in  Queue  in  the  Marshal  Stack  (WTQMS)  were  highly  correlated.  TAAT  was  re¬ 
tained,  as  it  has  the  highest  external  validity;  operators  are  highly  concerned  with  the 

11  External  validity  here  relates  to  the  how  the  measure  correlates  with  operators’  views  of  system  per¬ 
formance.  Interviews  revealed  that  operators  do  not  consider  Wait  Time  in  Queue  at  Catapult,  but  have  a 
high  concern  for  Total  Aircraft  Active  Time  (TAAT).  There  are  also  levels  of  external  validity;  some 
measures  are  highly  significant  to  operators,  while  others  are  useful,  but  not  greatly  valued. 
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value  of  this  metric  and  do  not  currently  calculate  WTQMS  or  TAT.  However,  this  does 
not  imply  that  the  two  remaining  metrics  should  be  discarded.  WTQMS  is  not  calculated 
by  operators  currently,  but  was  retained  for  its  insight  into  the  system’s  efficiency  in 
landing  aircraft.  Tables  denoting  PCA  results  for  each  planner-scenario  combination  and 
the  resulting  cross-correlations  appear  in  Appendix  F.  At  the  conclusion  of  this  process, 
nineteen  metrics  remained,  a  list  of  which  is  found  in  Table  12,  along  with  their  applica¬ 
ble  testing  scenarios. 


Table  12.  Final  list  of  DCAP  Test  metrics,  with  applicable  scenarios  (check  marks,  V, 
signify  the  test  is  applicable  for  the  scenario). 


Class 

Metric 

Abbreviation 

Simple 

Moderate 

Complex 

Fuel  Violations 

FV 

y 

Landing  Zone  Foul  Time 

LZFT 

y 

y 

FMAC  #6  Hydraulic  Fluid  Remaining 

FMAC  #6  HFR 

y 

ME-Error 

FMAC  #6  Recovery  Time 

FMAC6EART 

V 

SMAC  #2  Active  Time 

SMAC2AAT 

y 

V 

SMAC  #2  Fuel  Remaining 

SMAC2EFR 

V 

V 

SMAC  #2  Recovery  Time 

SMAC2EART 

y 

y 

Total  Aircraft  Taxi  Time 

TATT 

y 

V 

y 

Total  Aircraft  Active  Time 

TAAT 

y 

V 

y 

ME-Time 

Total  Crew  Active  Time 

TCAT 

y 

V 

y 

Wait  Time  in  Queue  -  Marshal  Stack 

WTQMS 

V 

y 

Mission  Duration 

MD 

V 

✓ 

y 

Catapult  2  Launch  Rate 

C2LR 

V 

y 

ME-Coverage 

Catapult  3  launch  Rate 

C3LR 

y 

y 

Catapult  4  Launch  Rate 

C4LR 

V 

y 

Total  Catapult  Launch  Rate 

TCLR 

V 

y 

ABE-Auto 

Halo  Violations 

HV 

y 

V 

y 

Halo  Violation  Duration 

HVD 

V 

V 

y 

HBE-IPE 

User  Interaction  Time 

UIT 

y 

y 

y 

Total  number 
of  statistical 

34 

40 

49 

comparison 

As  noted  in  the  table,  some  scenarios  did  not  require  certain  metrics.  Fuel  Violations 
(FV)  did  not  occur  in  the  Simple  and  Moderate  scenarios,  but  did  occur  in  the  Complex 
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scenario;  thus,  only  the  Complex  case  contains  a  check  for  FV.  The  Moderate  case  did 
not  require  any  aircraft  launches,  thus  no  Catapult  Launch  Rate  measures  are  required. 
For  the  metrics  that  remain,  there  typically  exists  three  statistical  comparisons  to  be  per¬ 
formed  -  B  vs.  FIO,  B  vs.  FIA,  and  HO  vs.  HA.  The  only  metric  that  does  not  require 
three  tests  is  the  User  Interaction  Time  metric,  because  no  user  interaction  was  performed 
in  the  B  cases.  This  results  in  totals  of  34,  40,  and  49  metrics  for  the  three  scenarios 
(Simple,  Moderate,  and  Complex,  respectively).  Given  this  number  of  tests,  the  compari¬ 
son-wise  significance  level  for  each  statistical  test,  a,  can  be  calculated  to  preserve  an 
overall  family- wise  significance  level,  afW.  These  values  are  found  in  Table  13. 

Table  13.  Test  significance  levels  desired  for  family-wise  significance  level  of  0.05. 


Scenario 

Family-wise 
significance  level 

Test  significance 
level 

Simple 

34 

0.05 

0.0015 

Moderate  40  0.05  0.0013 

Complex 

49 

0.05 

0.0010 

ANalysis  Of  Variance  (ANOVA)  tests  were  desired  for  the  statistical  comparisons 
and  require  that  the  data  exhibit  both  normality  and  homoskedasticity.  Data  were  first 
tested  for  normality.  For  data  that  showed  normality,  Levene  tests  for  heteroskedasticity 
were  performed,  with  additional  transformations  attempted  if  tests  showed  heteroskedas¬ 
ticity.  For  data  that  tested  both  normal  and  homoskedastic,  parametric  ANOVA  tests 
were  acceptable  and  were  utilized  in  the  analysis.  For  all  other  cases,  non-parametric 
Mann- Whitney  U  tests  were  used  to  compare  distributions.  Full  outputs  of  the  Kol- 
mogorov-Smirnov  tests  for  normality  and  Levene  tests  for  heteroskedasticity  appear  in 
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Appendix  G  and  Appendix  H,  respectively.  The  remaining  sections  in  this  chapter  pre¬ 
sent  the  results  of  this  statistical  testing  in  tabular  form;  for  brevity,  boxplots  for  this  data 
have  been  placed  in  Appendix  I  (Simple  scenario),  Appendix  J  (Moderate  Scenario),  and 
Appendix  K  (Complex  Scenario). 

5.2.  WlTHIN-SCENARIO  RESULTS 

This  section  presents  the  results  of  statistical  testing  within  each  scenario.  Compari¬ 
sons  of  perfonnance  are  drawn  between  metrics  from  all  three  planning  conditions  (Base¬ 
line,  Human-Only,  and  Human-Algorithm),  with  a  focus  on  the  performance  of  the  HA 
planner.  Tables  are  presented  providing  the  results  of  statistical  testing,  with  the  type  of 
test  performed  and  the  resulting  significance  values  reported  for  each  test.  Additional  ta¬ 
bles  provide  qualitative  analyses  of  the  relationships  within  the  data.  These  two  forms  of 
analysis  aid  in  the  identification  of  differences  in  perfonnance  between  the  HA  and  HO 
planners,  which  are  discussed  at  the  end  of  each  section.  In  these  discussions,  the  metrics 
are  also  used  to  explain  differences  in  the  data,  revealing  not  only  errors  on  the  part  of  the 
planner,  but  also  shortcomings  in  the  operator  heuristics. 

While  there  is  a  large  amount  of  data  that  can  be  analyzed,  each  section  focuses  pri¬ 
marily  on  the  metrics  that  support  the  identification  of  differences  in  perfonnance  be¬ 
tween  the  HO  and  HA  planners  and  their  subsequent  explanation.  The  remaining  meas¬ 
ures  are  noted  in  the  Appendices  for  completeness,  but  may  not  be  specifically  discussed. 

5.2.1.  Simple  Scenario 

The  Simple  scenario  (a  launch  scenario)  included  twenty  aircraft  launched  from  the 

carrier  deck.  During  launch  procedures,  one  of  the  aft  catapults  failed,  requiring  the  reas- 
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signment  of  launch  tasks  among  the  remaining  operational  catapults.  Examination  of  the 
performance  of  the  planners  in  this  scenario  revealed  that  the  Human-Only  plans  outper¬ 
formed  the  Human-Algorithm  plans,  but,  as  expected,  neither  was  able  to  reach  the  same 
level  of  performance  as  the  Baseline  condition. 

The  primary  descriptive  metric  for  perfonnance  in  this  scenario  is  Mission  Duration 
(MD).  Measures  for  the  launch  rates  of  Catapult  2  and  Catapult  4  and  the  total  launch  rate 
(C2LR,  C4LR,  and  TCLR,  respectively)  were  used  in  a  diagnostic  role,  aiding  in  identi¬ 
fying  the  causes  of  the  HA  planner’s  poor  performance  as  compared  to  the  HO  planner. 
The  following  sections  cover  the  statistical  testing  performed  on  these  metrics,  a  qualita¬ 
tive  analysis  of  the  differences  in  these  metrics,  and  a  discussion  of  the  implications  of 
the  results  and  how  they  revealed  superior  perfonnance  on  the  part  of  the  HO  planner. 

5.2. 1.1.  Results  of  Statistical  Testing 

Table  14  presents  a  compilation  of  the  statistical  testing  data  for  the  Simple  scenario. 
Within  this  table,  the  metric  name  and  its  desired  magnitude  (High  or  Low)  are  presented, 
followed  by  columns  detailing  the  results  of  statistical  testing  between  pairs  of  planning 
conditions.  These  columns  list  the  statistical  test  applied,  the  results  of  the  test,  and  the 
relative  difference  between  the  two  conditions.  Significant  values  (p  <  0.0015)  imply  that 
the  null  hypothesis  (ho:  distributions  of  the  two  planning  conditions  are  identical)  was  re¬ 
jected.  The  relative  difference  between  two  data  sets  takes  one  of  three  fonns: 

1.  The  distributions  of  Planners  1  and  2  were  equivalent  (ho  could  not  be  rejected 
in  statistical  testing). 

2.  The  median  value  for  Planner  1  was  greater  than  that  of  Planner  2. 

3.  The  median  value  for  Planner  1  was  less  than  that  of  Planner  2. 
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Given  these  relationships,  the  superiority  of  a  planner  for  a  given  parameter  is  based  on 
the  desired  magnitude  of  the  metric.  For  instance,  LZFT  is  desired  to  be  lower;  thus,  be¬ 
cause  B  <  HO  in  the  Simple  scenario,  the  Baseline  has  better  perfonnance  in  this  respect. 
Reviewing  the  data  in  Table  14,  it  can  be  seen  that  all  but  three  comparisons  (all  for  HV- 
D)  were  shown  to  be  significantly  different12.  Within  the  remaining  statistically  signifi¬ 
cant  results,  the  Baseline  condition  was  shown  to  have  superior  perfonnance  for  all 
measures  except  for  certain  catapult  launch  rates.  The  HO  planner  was  shown  to  outper¬ 
form  the  Baseline  in  measures  of  C2LR,  C4LR,  and  TCLR,  while  the  HA  planner  also 
outperformed  the  Baseline  in  C4LR  (this  seemingly  counterintuitive  result  of  superior 
launch  rates  after  the  occurrence  of  failures  is  discussed  in  the  next  section).  Comparing 
the  HO  and  HA  plans,  the  HA  planner  was  shown  to  be  superior  in  tenns  of  C4LR  and 
UIT,  with  the  HO  planner  maintaining  superiority  in  all  other  metrics. 


Table  14.  Results  of  statistical  testing  for  the  Simple  scenario  (*  signifies  significance  at 
a  =  0.0015;  NP  =  Non-Parametric  Mann- Whitney  U  Test). 


Metric 

Desired 

Magnitude 

B  vs.  HO 

B  vs.  HA 

HO  vs.  HA  | 

Test 

p-value 

Relation 

Test 

p-value 

Relation 

Test 

Relation 

p-value 

LZFT 

LOW 

NP 

* 

B<HO 

NP 

* 

B<HA 

NP 

* 

HO<HA 

TATT 

LOW 

NP 

* 

B<HO 

NP 

* 

B<HA 

NP 

* 

HO<HA 

TAAT 

LOW 

NP 

* 

B<HO 

NP 

* 

B<HA 

NP 

* 

HO<HA 

TCAT 

LOW 

NP 

* 

B<HO 

NP 

* 

B<HA 

NP 

* 

HO<HA 

MD 

LOW 

NP 

* 

B<HO 

P 

* 

B<HA 

NP 

* 

HO<HA 

C2LR 

HIGH 

NP 

* 

B<HO 

NP 

* 

B>HA 

NP 

* 

HO>HA 

C3LR 

HIGH 

- 

- 

- 

- 

- 

- 

- 

- 

- 

C4LR 

HIGH 

NP 

* 

B<HO 

NP 

* 

B<HA 

NP 

* 

HO<HA 

TCLR 

HIGH 

NP 

* 

B>HO 

NP 

* 

B>HA 

NP 

* 

HO>HA 

HV 

LOW 

NP 

* 

B<HO 

NP 

* 

B<HA 

NP 

* 

HO<HA 

HV-D 

LOW 

NP 

P=0.005 

B=HO 

NP 

p=0.906 

B=HA 

NP 

p=0.535 

HO=HA 

UIT 

LOW 

NP 

- 

- 

NP 

- 

- 

NP 

* 

HO>HA 

12  Note  that,  due  to  the  failure  of  Catapult  3,  comparisons  of  C3LR  were  not  performed. 
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5.2. 1.2.  Discussion 


One  interesting  note  from  Table  14  is  that  the  HO  planner  outperfonned  the  Baseline 
in  all  launch  rate  values  for  Catapults  2  and  4  while  the  HA  planner  outperformed  the 
Baseline  only  in  C4LR,  even  with  the  HO  and  HA  planners  having  an  overall  longer  mis¬ 
sion  duration  value.  While  this  seems  counterintuitive,  recall  that  the  Launch  Rate 
(launches  per  minute)  is  determined  by  the  following  equation: 

LaunchRate  = - rhmnclm -  (3) 

MissionDuration 

where  niaunches  is  the  number  of  launches  assigned  to  that  catapult  and  MissionDuration  is 
the  final  calculated  Mission  Duration  value.  In  this  equation,  increasing  the  number  of 
launches  at  a  single  catapult  over  a  given  mission  duration  X  will  increase  the  Launch 
Rate  of  that  catapult.  An  increase  in  Launch  Rate  will  also  occur  for  Mission  Duration 
values  slightly  greater  than  X,  but  this  does  not  continue  unabated.  If  increases  in  mission 
duration  continue  unchecked,  the  value  for  the  launch  rate  will  begin  to  decrease.  For  in¬ 
stance,  with  an  initial  niaUnches  of  5  and  Mission  Duration  of  15  minutes,  increasing  the 
number  of  launches  by  three  increases  launch  rate  for  all  Mission  Duration  values  less 
than  24  minutes.  As  a  result  of  this,  the  reallocation  of  launch  tasks  by  the  HO  and  HA 
planners  to  Catapults  2  and  4  and  to  Catapult  4,  respectively,  provided  sufficiently  large 
increases  in  niaunches  to  outweigh  increases  in  Mission  Duration  (Figure  18  and  Figure  19). 
However,  because  the  number  of  launches  is  the  system  is  fixed  at  twenty,  these  increases 
in  Mission  Duration  cause  detrimental  affects  in  the  Total  Catapult  Launch  Rate  (Figure 
20). 
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C2LR 


B 


HO  HA 

Planning  Condition 


Figure  18.  Simple  scenario,  Catapult  2  launch  rate  (launches  per  minute). 


C4LR 


Figure  19.  Simple  scenario,  Catapult  4  launch  rate  (launches  per  minute). 
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TCLR 


Figure  20.  Simple  scenario,  Total  Catapult  Launch  Rate  (launches  per  minute). 

In  terms  of  a  general  performance  comparison  between  the  FIO  and  HA  planners,  the 
most  important  point  is  that  the  HO  planner  performed  better  in  Mission  Duration.  This 
measure  directly  addresses  the  total  time  required  to  launch  all  aircraft  from  the  carrier 
deck  and  reach  mission,  a  measure  of  primary  importance  in  the  wake  of  a  launch  cata¬ 
pult  failure.  In  this  case,  catapult  launch  rates  from  this  scenario  also  explain  the  changes 
in  Mission  Duration  -  the  distribution  of  launch  tasks  between  all  remaining  catapults  by 
the  HO  planner  resulted  in  a  better  total  launch  rate  than  the  HA  planner.  In  this  case,  a 
detailed  discussion  of  the  relative  magnitudes  of  the  individual  catapult  launch  rates  is 
not  as  of  much  concern  as  the  fact  that  the  HA  planner  did  not  assign  aircraft  to  Catapult 
2  in  the  majority  of  cases  (Figure  18).  The  HO  planner,  based  on  SME  heuristics,  lever¬ 
aged  all  available  resources  in  order  to  achieve  a  faster  processing  time  than  the  auto¬ 
mated  planner  (signified  by  increases  in  TCLR  in  Figure  20). 
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While  this  review  of  quantitative  data  helps  to  identify  the  actions  of  the  planner  that 
created  poor  performance  ratings,  but  they  are  still  not  sufficiently  detailed  to  detennine 
the  specific  failures  in  logic  of  the  HA  planner.  However,  this  data  supports  the  creation 
of  two  rival  explanations  as  to  why  the  HA  planner  made  no  assignments  to  Catapult  2. 
In  the  first  explanation,  the  algorithm  optimization  may  have  shown  that  assigning  all  air¬ 
craft  to  Catapult  4  was  the  theoretical  optimum  for  these  cases.  In  the  second,  the  algo¬ 
rithm  state  data  could  have  returned  a  faulty  value  or  made  an  incorrect  assumption  con¬ 
cerning  the  availability  of  Catapult  2  during  the  replanning  stage,  forcing  the  planner  to 
make  allocations  to  only  a  single  catapult. 

After  investigation,  the  latter  explanation  was  shown  to  be  correct  -  the  algorithm  in¬ 
correctly  considered  a  certain  transient  deck  condition  to  be  permanent,  leading  to  the 
assumption  that  Catapult  2  was  also  unavailable.  This  then  led  to  an  inappropriate  distri¬ 
bution  of  tasks  on  the  deck,  utilizing  only  a  single  resource.  This  unnecessary  constraint 
on  operations,  leading  to  increased  active  time,  taxi  time,  and  mission  duration  values,  is 
an  example  of  the  brittleness  that  often  accompanies  automated  algorithms.  Without  prior 
coded  knowledge  concerning  the  transient  nature  of  this  condition,  the  algorithm  was  un¬ 
able  to  properly  compensate  for  its  occurrence.  Instead,  the  planner  considered  this  event 
to  be  a  pennanent  failure  of  the  catapult,  which  is  the  only  failure  the  algorithm  could 
recognize.  The  planner  then  constructed  a  plan  that  was  near  optimal  given  this  faulty  sys¬ 
tem  information.  Correcting  this  error  in  state  translation  may  allow  the  planning  algo¬ 
rithm  to  generate  plans  as  good  or  better  than  those  developed  by  the  HO  planner  in  this 
round  of  testing. 
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This  explanation  would  not  have  been  reached  without  the  inclusion  of  the  additional 
launch  rate  metrics  within  the  Mission  Efficiency  -  Coverage  subclass.  Although  these 
measures  are  not  primary  measures  of  mission  performance,  in  this  case,  they  were  help¬ 
ful  in  identifying  certain  inappropriate  actions  on  behalf  of  the  HA  planner. 

5.2.2.  Moderate  Scenario 

The  Moderate  scenario  (a  recovery  scenario)  required  the  safe  landing  of  twenty  air¬ 
craft  currently  in  flight.  During  landing  procedures,  two  aircraft  (SMAC  #2  and  FMAC 
#6)  encountered  failures  (high  priority  fuel  leak  and  low  priority  hydraulic  leak,  respec¬ 
tively).  This  required  reassigning  the  landing  order  of  aircraft  to  ensure  that  both  of  these 
aircraft  landed  before  encountering  a  Fuel  or  Hydraulic  Fluid  Violation  (FV  or  HFV), 
respectively.  In  examining  the  perfonnance  of  the  planners  in  this  scenario,  mixed  results 
between  the  planning  conditions  were  seen.  The  HA  planner  outperformed  the  HO  plan¬ 
ner  in  measures  that  addressed  global  performance  (such  as  Mission  Duration  and  Total 
Aircraft  Active  Time),  while  the  HO  planner  maintained  superior  performance  in  meas¬ 
ures  addressing  the  high  priority  aircraft  (measures  for  the  failed  SMAC).  The  following 
section  contains  the  results  of  the  statistical  testing,  which  will  be  followed  by  a  subse¬ 
quent  section  discussing  these  results. 

5.2. 2.1.  Results  of  Statistical  Testing 

Table  15  presents  a  compilation  of  the  statistical  testing  data  for  the  Moderate  sce¬ 
nario.  This  table  also  lists  the  metric  name,  its  desired  magnitude  (High  or  Low),  and  de¬ 
tails  the  statistical  testing  between  pairs  of  planning  conditions.  Significant  values  (p  < 
0.0013)  imply  that  the  null  hypothesis  (ho:  distributions  of  the  two  planning  conditions 
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are  not  different)  was  rejected.  Testing  revealed  several  instances  of  statistically  equiva¬ 
lent  performance,  most  notably  for  LZFT,  HV,  and  HV-D,  where  no  differences  were 
seen  between  planning  conditions.  Additionally,  the  Baseline  and  HO  planners  were 
found  to  be  equivalent  for  TAAT  and  TCAT,  while  the  Baseline  and  HA  planners  were 
equivalent  for  TAAT  and  TCAT.  The  remaining  measures  all  resulted  in  statistical  dif¬ 
ferences  between  planning  conditions,  most  importantly  among  Mission  Duration  and 
measures  for  SMAC  #2  and  FMAC  #6. 


Table  15.  Results  of  statistical  testing  for  the  Moderate  Scenario  (*  signifies  significance 
at  a  =  0.0013;  NP  =  Non-Parametric  Mann- Whitney  U  Test;  P  =  Parametric  ANOVA). 


Metric 

Desired 

Magnitude 

B  vs.  HO 

B  vs.  HA 

HO  vs.  HA  ] 

Test 

p-value 

Relationship 

Test 

p-value 

Relationship 

Test 

p-value 

Relationship 

LZFT 

LOW 

P 

p=0.733 

B=HO 

P 

p=0.052 

B=HA 

P 

p-=0.023 

HO=HA 

FMAC  6 
HFR 

HIGH 

NP 

* 

B>HO 

NP 

* 

B>HA 

NP 

* 

HO<HA 

FMAC  6 
EART 

LOW 

NP 

* 

B<HO 

NP 

* 

B<HA 

NP 

* 

HO>HA 

SMAC  2 
AAT 

LOW 

P 

* 

B>HO 

P 

* 

B>HA 

P 

* 

HO<HA 

SMAC  2 
EFR 

HIGH 

P 

* 

B<HO 

NP 

* 

B<HA 

NP 

* 

HO>HA 

SMAC  2 

EART 

HIGH 

NP 

* 

B>HO 

NP 

* 

B>HA 

NP 

* 

HO<HA 

TATT 

LOW 

NP 

* 

B>HO 

P 

p=0.032 

B=HA 

P 

* 

HO<HA 

TAAT 

LOW 

P 

p=0.132 

B=HO 

NP 

* 

B<HA 

NP 

* 

HO>HA 

TCAT 

LOW 

NP 

p=0.018 

B=HO 

P 

p=0.021 

B=HA 

P 

* 

HO<HA 

WTQMS 

LOW 

NP 

* 

B<HO 

NP 

* 

B<HA 

NP 

* 

HO>HA 

MD 

LOW 

NP 

* 

B<HO 

NP 

* 

B<HA 

NP 

* 

HO>HA 

HV 

LOW 

NP 

p=0.909 

B=HO 

NP 

p=0.001 

B>HA 

NP 

p=0.002 

HO=HA 

HV-D 

LOW 

P 

p=0.005 

B=HO 

NP 

p=0.046 

B=HA 

P 

p=0.402 

HO=HA 

UIT 

LOW 

- 

- 

- 

- 

- 

- 

NP 

* 

HO<HA 

Reviewing  the  relationships  in  this  data,  two  major  themes  arise.  First,  the  HO  and 
HA  planners  developed  schedules  that  addressed  the  two  aircraft  failures  in  similar,  but 
different,  manners.  Both  the  HO  and  HA  planners  moved  the  SMAC  aircraft  (fuel  leak) 
forward  in  the  landing  order,  as  demonstrated  by  lower  values  of  SMAC  2  EART  as 
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compared  to  the  Baseline13.  Also,  both  the  HO  and  HA  planner  moved  the  FMAC  (hy¬ 
draulic  leak)  backwards  in  the  landing  order,  signified  by  larger  values  of 
FMAC6EART  as  compared  to  the  Baseline.  Comparing  the  HO  and  HA  planners,  it 
can  be  seen  that  the  HO  planner  moved  each  aircraft  to  a  greater  degree  than  the  HA 
planner  (the  HA  moved  the  SMAC  forward  1  slot  in  the  landing  order  as  opposed  to  1 1 
slots  in  the  HO  plan,  while  moving  the  FMAC  backwards  6  slots  as  opposed  to  7  in  the 
HO  plan).  Secondly,  the  HA  planner  completed  the  mission  in  less  time  than  the  HO 
planner,  as  signified  by  superior  performance  in  Mission  Duration.  The  diagnostic  meas¬ 
ure  WTQMS  supports  this  view,  showing  that  the  HA  planner  required  aircraft  to  be  in 
the  Marshal  Stack  holding  pattern  for  less  time  overall. 

5.2. 2.2.  Discussion 

The  results  of  testing  for  the  Moderate  scenario  show  mixed  results  for  the  perform¬ 
ance  of  the  HA  planner  as  compared  to  the  HO  planner.  Firstly,  the  HA  planner  differed 
in  its  approach  to  rescheduling  the  two  failed  aircraft.  While  the  HA  planner  followed  the 
instructions  of  the  Expert  User,  moving  the  SMAC  (fuel  leak)  forward  in  the  landing  or¬ 
der  and  the  FMAC  (hydraulic  leak)  backwards,  this  was  done  to  a  lesser  extent  than  the 
HO  planner.  This  resulted  in  better  perfonnance  for  the  HA  planner  with  respect  to  the 
FMAC  but  lower  perfonnance  with  respect  to  the  SMAC.  However,  this  launch  order 
lead  to  a  decrease  in  overall  WTQMS  and  MD  values  and  no  increase  in  the  number  of 
Fuel  or  Hydraulic  Fluid  Violations  (which  did  not  occur  for  any  case). 


Recall  the  EART  is  the  duration  of  time  from  when  the  aircraft  failure  first  occurs  to  the  point  that 
the  aircraft  lands.  Thus,  a  lower  EART  for  a  given  planner  signifies  that  the  aircraft  landed  earlier. 
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In  this  case,  the  determination  as  to  which  planning  condition  performed  better  in 
general  depends  on  the  perspective  of  the  analyst.  If  viewed  in  terms  of  overall  perform¬ 
ance,  the  HA  planner  was  better  at  optimizing  the  full  system  schedule,  as  revealed  by  its 
ability  to  decrease  the  overall  Mission  Duration  and  WTQMS.  If  viewed  in  terms  of  the 
number  of  FV  and  HFV,  the  planners  performed  equally  well.  However,  if  viewed  in 
tenns  of  adherence  to  operator  heuristics,  the  HA  planner  may  be  viewed  by  operators  as 
having  inferior  perfonnance.  Although  the  HA  plan  moved  aircraft  in  the  same  manner  as 
the  HO  plan,  the  magnitude  of  aircraft  movements  was  of  different  magnitudes,  espe¬ 
cially  in  terms  of  the  movement  of  the  high  priority  SMAC  aircraft.  While  the  HA  sched¬ 
ule  was  able  to  maximize  its  objectives  of  decreasing  TAAT,  WTQMS,  and  MD,  the 
mismatch  between  the  desired  state  of  SMAC  2  (as  judged  by  the  HO  planning  action) 
and  the  actions  of  the  planning  algorithm  may  be  undesirable. 

Considering  these  three  perspectives  holistically  provides  additional  insight  with  re¬ 
gards  to  the  SME  heuristics.  The  SME  heuristic  considering  airborne  aircraft  failures 
maintains  that  severe  failures  (fuel  leaks)  be  moved  to  the  front  of  the  landing  order  to 
minimize  the  chance  the  aircraft  running  out  of  fuel.  Aircraft  with  less  critical  failures 
(hydraulic  leaks)  are  moved  to  the  end  of  the  landing  order  to  minimize  the  possibility  of 
the  aircraft  crashing  on  landing,  thus  placing  the  other  airborne  aircraft  in  harm’s  way.  In 
this  testing  scenario,  the  HA  planner  followed  these  suggestions,  but  not  to  the  extent  de¬ 
scribed  in  the  heuristics.  This  less  conservative  planning  strategy  resulted  in  an  overall 
decrease  in  total  mission  time  while  not  incurring  any  more  severe  penalties  in  the  form 
of  FV  or  HFV.  This  suggests  that  the  actions  ordered  by  the  heuristics  may  be  overly 
conservative.  This  is  also  an  instance  of  P/RA  HSC  systems  working  as  desired  -  the 
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human  operator  specified  a  set  of  high-level  guidelines,  within  which  the  planning  algo¬ 
rithm  was  able  to  develop  a  locally  optimum  plan. 

These  results  could  not  have  been  seen  if  only  a  small  set  of  metrics  is  reviewed  in 
the  course  of  examining  performance.  If  the  analysis  had  been  limited  only  to  the  meas¬ 
ures  of  primary  concern  for  the  stakeholders  -  FV,  HFV,  and  measures  for  the  SMAC 
and  FMAC  -  the  performance  of  the  HA  planner  in  optimizing  the  overall  mission  dura¬ 
tion  would  have  gone  unnoticed.  It  would  instead  have  been  judged  as  moderately  effec¬ 
tive  but  still  inferior  to  the  HO  planner.  Only  by  the  inclusion  of  additional  metrics  was 
the  effectiveness  of  the  HA  planner  in  optimizing  the  entire  system  seen. 

5.2.3.  Complex  Scenario 

The  Complex  scenario  (a  mixed  recovery  and  launch  scenario)  required  the  safe  land¬ 
ing  of  twenty  aircraft  currently  in  flight  while  simultaneously  launching  two  additional 
aircraft.  During  landing  procedures,  a  single  aircraft  encountered  a  high  priority  emer¬ 
gency  (SMAC  #2  encountered  a  fuel  leak).  Solving  this  problem  required  balancing  the 
need  to  land  the  aircraft  immediately  with  the  need  to  launch  two  others.  The  results  of 
this  scenario  showed  that  the  HO  planner  was  able  to  address  both  the  failures  and  the 
additional  launches  effectively,  while  the  HA  planner’s  solution  further  exacerbated 
problems  in  the  system.  This  is  primarily  revealed  through  an  analysis  of  the  number  of 
Fuel  Violations  (FV),  total  Mission  Duration  (MD),  Total  Aircraft  Active  Time  (TAAT), 
the  error  measures  for  the  emergency  aircraft  (SMAC  #2),  Landing  Zone  Foul  Time 
(LZFT),  and  Wait  Time  in  Queue  in  the  Marshal  Stack  (WTQMS).  Additional  analysis  of 
the  launch  rates  for  Catapults  2  through  4,  as  well  as  the  Total  Catapult  Launch  Rate 


105 


(TCLR),  aided  in  understanding  the  planner  actions  that  led  to  these  differences  in  per¬ 
formance.  The  following  two  sections  present  the  results  of  statistical  testing,  followed 
by  a  discussion  of  these  results. 

5.2. 3.1.  Results  of  Statistical  Testing 

Table  16  presents  a  compilation  of  the  statistical  testing  data  for  the  Complex  sce¬ 
nario.  This  table  also  lists  the  metric  name,  its  desired  magnitude  (High  or  Low),  and  de¬ 
tails  the  statistical  testing  between  pairs  of  planning  conditions.  Significant  values  (p  < 
0.0010)  imply  that  the  null  hypothesis  (ho:  distributions  of  the  two  planning  conditions 
are  identical)  was  rejected.  In  these  results,  it  can  be  seen  every  statistical  comparison 
involving  the  HA  planning  condition  (compared  to  either  the  B  or  HO  conditions)  re¬ 
turned  significance.  The  only  cases  where  measures  were  seen  to  be  statistically  equiva¬ 
lent  were  found  in  the  B  vs.  HO  comparison  (LZFT,  TATT,  MD,  C2LR).  Reviewing  the 
relationships  with  statistically  different  data  shows  that  there  were  differences  in  per¬ 
formance  concerning  the  emergency  aircraft  SMAC  #2  (which  encountered  a  fuel  leak). 
From  the  data  it  can  be  seen  that  the  HO  planner  moved  the  SMAC  forward  (signified  by 
a  lower  value  of  SMAC  2  EART  as  compared  to  the  Baseline).  The  HA  solution,  how¬ 
ever,  either  moved  the  SMAC  backwards  in  the  landing  order,  or  its  assignment  of  launch 
tasks  delayed  the  landing  of  the  aircraft.  Additionally,  the  HA  planner  exhibited  more 
Fuel  Violations  (FV)  than  either  of  the  other  conditions,  with  the  HO  planner  demonstrat¬ 
ing  the  lowest  FV  value  overall.  Also,  the  HA  planner  required  more  time  to  complete  the 
mission  than  either  of  the  other  two  cases  (signified  by  poorer  performance  on  MD).  In 
handling  the  launches  of  the  requested  aircraft,  differences  in  planning  strategy  also  oc- 
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curred,  with  the  HO  planner  assigning  aircraft  only  to  Catapult  2  and  the  HA  planner  as¬ 
signing  aircraft  to  Catapults  3  and  4  (signified  by  higher  launch  rates  for  each  case). 


Table  16.  Results  of  statistical  testing  for  the  Complex  Scenario  (*  signifies  significance 
at  a  =  0.001;  NP  =  Non-Parametric  Mann- Whitney  U  Test;  P  =  Parametric  ANOVA). 


Metric 

Desired 

Magnitude 

B-HO 

B-HA 

HO-HA  | 

Test 

p-value 

Relationship 

Test 

p-value 

Relationship 

Test 

p-value 

Relationship 

FV 

LOW 

P 

n/a14 

B>HO 

NP 

* 

B<HA 

NP 

* 

HO<HA 

LZFT 

LOW 

P 

p=0.478 

B=HO 

NP 

* 

B<HA 

P 

* 

HO<HA 

SMAC_2_AAT 

LOW 

NP 

* 

B>HO 

NP 

* 

B<HA 

NP 

* 

HO<HA 

SMAC_2_EFR 

HIGH 

P 

* 

B<HO 

P 

* 

B>HA 

P 

* 

HO>HA 

SMAC_2_EART 

LOW 

P 

* 

B>HO 

P 

* 

B<HA 

P 

* 

HO<HA 

TATT 

LOW 

P 

p=0.734 

B=HO 

P 

* 

B<HA 

P 

* 

HO<HA 

TAAT 

LOW 

P 

* 

B<HO 

NP 

* 

B<HA 

NP 

* 

HO<HA 

TCAT 

LOW 

P 

* 

B>HO 

P 

* 

B<HA 

P 

* 

HO<HA 

WTQMS 

LOW 

NP 

* 

B<HO 

NP 

* 

B<HA 

NP 

* 

HO<HA 

MD 

LOW 

P 

p=0.927 

B=HO 

NP 

* 

B<HA 

NP 

* 

HO<HA 

C2LR 

HIGH 

P 

p=0.921 

B=HO  | 

P 

* 

B>HA 

P 

* 

HO>HA 

C3LR 

HIGH 

P 

n/a11 

B=HO=0 

NP 

* 

B<HA 

NP 

* 

HO<HA 

C4LR 

HIGH 

P 

n/a11 

B=HO=0 

NP 

* 

B<HA 

NP 

* 

HO<HA 

TCLR 

HIGH 

P 

p=0.921 

B=HO 

NP 

* 

B>HA 

NP 

* 

HO>HA 

HV 

LOW 

NP 

p=0.395 

B=HO 

NP 

* 

B>HA 

NP 

* 

HO>HA 

HV-D 

LOW 

NP 

* 

B>HO 

NP 

* 

B>HA 

NP 

* 

HO>HA 

UIT 

LOW 

- 

- 

; 

- 

- 

- 

NP 

* 

HO<HA 

5.2. 3.2.  Discussion 

In  the  Complex  scenario,  the  HA  planner  was  outperformed  by  the  HO  planner  in  all 
of  the  Mission  Efficiency  metrics,  most  importantly  in  measures  of  Fuel  Violations  and 
Mission  Duration.  In  fact,  the  only  metrics  in  which  the  HA  planner  had  better  perform¬ 
ance  were  launch  rate  values  for  Catapults  3  and  4,  and  this  only  signifies  a  difference  in 
the  assignment  of  launch  tasks.  These  launch  rate  measures  point  to  the  root  causes  of 
poor  HA  planner  performance. 

14  For  these  tests,  values  were  either  identical  for  all  cases  or  zero  for  all  cases,  making  statistical  test¬ 
ing  infeasible. 
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Reviewing  the  boxplots  for  the  launch  rates  for  Catapults  2-4  and  the  total  launch  rate 
(found  in  Figure  K.  1 1  through  Figure  K.  14  in  Appendix  K),  it  can  be  seen  that  the  HO 
planner  made  assignments  only  to  Catapult  2,  while  the  HA  planner  made  assignments  to 
only  Catapults  3  and  4.  For  the  majority  of  times,  assigning  aircraft  to  these  aft  catapults 
has  no  affect  on  airborne  aircraft.  However,  for  this  scenario  -  where  a  set  of  airborne 
aircraft,  low  on  fuel,  will  imminently  land  -  this  assignment  can  create  significant  reper¬ 
cussions.  Recall  that  the  aft  catapults  (Catapults  3  and  4)  share  deck  space  with  the  Land¬ 
ing  Zone  and  that  these  resources  cannot  be  operated  simultaneously.  Use  of  the  aft  cata¬ 
pults  during  landing  operations  may  result  in  the  waveoff  of  an  aircraft  on  approach.  This 
occurs  if  the  landing  strip  is  unavailable  when  an  approaching  aircraft  reaches  a  certain 
threshold  in  its  approach  trajectory.  If  a  waveoff  is  required,  the  incoming  aircraft  returns 
to  the  Marshal  Stack  before  attempting  a  second  landing15.  This  would  then  incur  addi¬ 
tional  WTQMS,  increase  the  total  Mission  Duration,  and  increase  the  likelihood  of  a  fuel 
violation.  In  the  case  of  the  HA  replanning,  each  of  these  conditions  was  seen,  suggesting 
that  the  HA  plan  was  incurring  waveoffs  due  to  its  catapult  assignments. 

While  several  explanations  can  be  developed  concerning  the  reason  for  assigning 
launches  to  the  aft  catapults,  the  true  cause  is  that  the  planning  algorithm  did  not  account 
for  the  interaction  of  the  catapults  and  the  landing  strip  and  predicted  that  its  assigned 
launch  tasks  were  the  fastest  available  option.  In  this  case,  however,  the  fastest  launch 
configuration  did  not  necessarily  imply  optimality.  Because  of  the  interactions  of  re¬ 
sources  on  the  deck,  the  best  option  (as  shown  in  the  HO  plan)  would  have  been  to  vacate 

15  In  reality,  this  would  not  occur  in  precisely  this  manner.  Judgments  on  how  to  handle  a  waved  off 
aircraft  require  knowledge  of  the  fuel  state  of  the  aircraft  and  the  availability  of  any  airborne  tankers  for 
refueling.  As  the  latter  is  not  handled  by  the  planning  algorithm  at  this  time,  the  method  of  handling  wave¬ 
offs  was  altered  for  the  time  being. 
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the  aft  catapults  to  deconflict  aircraft  landings.  The  realization  of  this  fault  in  the  planner 
modeling  could  not  have  been  reached  if  metrics  had  been  limited  only  to  Time-based 
(ME-T)  measures  or  to  metrics  specifically  related  to  SME  heuristics  (such  as  the  han¬ 
dling  of  the  SMAC).  Limiting  the  analysis  to  only  Coverage-based  metrics  would  have 
revealed  differences  in  planning  strategy  but  would  have  failed  to  demonstrate  how  these 
differences  affected  the  mission  as  a  whole.  The  combination  of  the  two  led  to  a  viable 
explanation  for  how  the  planner’s  actions  were  determined.  In  addition  to  comparing  the 
performance  of  planners  within  an  individual  scenario,  the  performance  of  planners 
across  scenarios  can  also  be  compared.  This  comparison  addresses  the  performance  of 
planners  over  the  columns  in  Figure  17  and  is  discussed  in  the  following  section. 

5.3.  Performance  Across  Scenarios 

The  performance  of  systems  across  complexity  levels  allows  analysts  to  detect  poten¬ 
tial  issues  with  the  brittleness  of  algorithms  (their  inability  to  account  for  possible  inputs 
and  environmental  conditions),  while  also  determining  the  limits  of  the  human  operator 
or  algorithm  in  regards  to  increasing  complexity  levels.  Typically,  algorithms  are  ex¬ 
pected  to  perform  better  in  cases  requiring  rapid,  complex  mathematical  optimizations 
[10],  but  brittleness  may  negate  this  likelihood.  By  testing  across  a  variety  of  inputs  and 
complexity  levels,  potential  instances  of  brittleness  -  due  to  either  improper  environ¬ 
mental  models  or  the  inability  to  sense  certain  conditions  -  can  be  uncovered.  For  the 
case  of  the  DCAP  testing  program,  complexity  was  determined  according  to  the  number 
of  required  Subject  Matter  Expert  heuristics  required  to  replan,  providing  a  consistency 
of  complexity  definition  for  three  dissimilar  test  cases  (one  launch-only,  one  landing- 
only,  one  combined  launch/landing). 
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With  this  definition  of  complexity  in  place,  perfonnance  across  scenario  conditions 
can  be  judged  in  several  manners.  For  the  purposes  of  this  discussion,  performance  is 
based  on  a  scoring  function  relating  how  well  two  planners  compared  to  each  other  in  a 
given  scenario  (for  instance,  the  HO  versus  the  Baseline).  This  scoring  function  is  given 
by  the  following  equation: 

N  •  -N  ■  f  ■ 

rerjormancebcore  = - -  (4) 

N  total 

where  N superior  is  the  number  of  metrics  where  the  HO  planner  outperfonned  the  Baseline, 
N inferior  is  the  number  of  metrics  where  the  HO  planner  underperfonned  with  regards  to 
the  Baseline,  and  Ntotai  is  the  total  number  of  metrics  used  in  statistical  testing  for  that 
scenario  (12,  14,  and  17  metrics  for  the  Simple,  Moderate,  and  Complex  scenarios,  re¬ 
spectively).  This  returns  a  percentage  score  centered  at  0,  with  +/-  100%  denoting  com¬ 
pletely  superior/inferior  perfonnance  for  a  planner,  respectively.  This  scoring  metric  was 
applied  for  all  planner  comparisons  across  all  three  scenarios,  limited  to  the  applicable 
metrics  discussed  in  the  previous  sections.  A  visual  representation  of  these  resulting 
scores  is  presented  in  Figure  21,  with  the  above  equation  lying  on  the  vertical  axis. 

In  this  figure,  the  HO  planner  is  seen  to  have  superior  performance  with  respect  to  the 
Baseline  plan  for  the  Complex  scenario.  Although  this  seems  counterintuitive,  it  is  due  to 
the  nature  of  the  Baseline  case  and  the  actions  of  the  HO  planner.  Recall  that  for  the 
Baseline  tests,  no  replanning  occurred.  For  SMAC  #2,  which  incurred  a  fuel  leak,  the  HO 
planner  will  replan  and  move  this  aircraft  forward  in  the  landing  order;  by  default,  the 
HO  plan  will  see  superior  performance  for  these  metrics.  In  fact,  these  three  metrics 
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(SMAC_2_  EART,  SMAC_2_AAT,  and  SMAC_2_EFR)  are  specifically  the  cause  of  the 
net  performance  increase  for  the  HO  planner  in  the  Complex  scenario. 
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Figure  21.  Relative  performance  comparison  across  scenarios. 


Additionally,  the  contents  of  Figure  21  as  a  whole  suggest  that,  for  this  P/RA  system, 
the  typical  assumption  of  superior  algorithm  performance  under  increasing  complexity  is 
inappropriate.  Observing  the  HO  vs.  B  comparison,  it  can  be  seen  that  the  performance  of 
the  HO  planner  improved  as  complexity  increased.  The  HA  planner  formed  an  inverted 
“U”  shape  with  respect  to  the  Baseline,  performing  better  in  the  Moderate  scenario  than 
in  the  Complex  and  Simple  scenarios  (the  reasons  for  this  will  be  discussed  later  in  this 
section).  This  HA  planner  also  exhibited  similar  performance  with  respect  to  the  HO 
planner.  Additionally,  at  no  time  did  the  performance  of  the  HA  planner  become  equal  or 
superior  to  the  Baseline  or  HO  planning  conditions.  In  this  case,  the  HA  planner  was  ob¬ 
served  to  perfonn  poorly  at  both  the  higher  and  lower  complexity  levels.  This  data  also 
suggests  that  the  human  heuristics,  as  applied  in  this  context  and  in  these  scenarios,  are 
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adequate  for  addressing  system  complexity,  as  relative  performance  increased  as  com¬ 
plexity  increased. 


These  results,  however,  are  dependent  on  both  the  metrics  used  in  the  analysis  and  the 
definition  of  complexity,  the  latter  of  which  will  be  discussed  in  a  later  paragraph.  Con¬ 
cerning  the  metrics  used  in  analysis,  the  previous  paragraph  describes  performance  in  re¬ 
lation  to  all  metrics  used  in  testing  (Figure  21).  Figure  22  shows  this  graph  again  with 
only  metrics  common  to  all  three  scenarios.  For  instance,  catapult  launch  rates  -  which  do 
not  exist  for  the  Moderate  scenario  -  have  been  removed,  as  have  metrics  for  aircraft  that 
experienced  failures. 
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Figure  22.  Relative  performance  comparison  across  Common  metrics. 


Between  Figure  21  (all  metrics)  and  Figure  22  (common  metrics),  the  relative  per¬ 
formance  of  each  planner  across  scenarios  is  generally  the  same,  with  the  HA  planner 
still  demonstrating  the  inverted  “U”  of  performance.  However,  the  removal  of  the  error 
metrics  for  failed  aircraft  in  the  Moderate  case  have  resulted  in  the  HA  planner  having 
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net  even  performance  to  the  HO  planner  in  this  scenario;  this  was  previously  a  case 
where  the  HA  planner  received  a  net  negative  rating  on  performance.  The  reasons  for  the 
inverted  U  of  HA  planner  performance  are  the  same  as  those  discussed  in  the  previous 
sections  on  individual  scenarios  and  are  recapitulated  here  -  in  the  Simple  scenario,  im¬ 
proper  state  modeling  led  the  HA  planner  to  solve  a  more  complex  problem  than  actually 
existed,  while  in  the  Complex  scenario,  certain  constraints  modeling  the  complexity  of 
interactions  on  deck  were  not  included.  However,  for  the  Moderate  case,  the  HA  planner 
was  able  to  achieve  equivalent  performance  as  the  HO  and  Baseline  planners,  with  the 
HA  planner  actually  achieve  superior  performance  in  the  important  metrics  of  Mission 
Duration.  In  this  case,  with  the  system  accurately  modeling  the  complexity  of  the  world, 
the  planner’s  ability  to  forecast  task  durations  and  the  interactions  between  vehicles  al¬ 
lowed  the  planner  to  create  a  more  efficient  plan.  These  conclusions  on  complexity,  how¬ 
ever,  are  based  on  the  definition  and  application  of  the  SME  heuristics;  alternative  defini¬ 
tions  of  complexity  may  not  yield  the  same  conclusions.  Basing  complexity  on  the 
maximum  number  of  entities  active  in  the  system  at  any  given  time  leads  to  a  reordering 
of  relative  complexity  (Table  17)  with  results  of  this  shown  in  Figure  23. 

Table  17.  Alternative  complexity  definitions  for  the  DCAP  test  scenarios. 


Original  Ordering 

Number  of  active 
entities 

Order  based  on 
number  of  entities 

Simple 

-100 

Moderate 

Moderate 

20-36 

Complex 

Complex 

22-46 

Simple 

As  noted  in  Table  17,  replanning  for  the  Simple  scenario  involved  altering  the  activity 
of  a  much  larger  number  of  personnel  (crew,  aircraft,  or  ground  vehicles)  than  the  other 
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two  scenarios.  This  is  due  to  the  fact  that  replanning  the  Simple  scenario  occurs  when  18 
aircraft  are  still  on  deck.  Thus,  with  four  crewmembers  required  per  aircraft,  five  per 
catapult,  and  additional  ground  deck,  approximately  100  personnel  are  affected  by  a  sin¬ 
gle  replan.  Replanning  for  the  Moderate  scenarios  involves  no  replanning  for  the  crew  or 
for  UGVs,  as  all  aircraft  are  still  airborne.  The  Complex  scenario  differs  from  the  Moder¬ 
ate  only  in  that  two  aircraft  will  launch  from  the  carrier  deck,  thus  affecting  only  a  small 
set  of  crew  and  support  vehicles  on  deck.  Using  this  perspective,  the  Simple  scenario  is 
actually  the  most  complex  scenario,  while  the  Moderate  scenario  is  the  least  complex. 
The  relative  perfonnance  ratings  under  this  format  (Figure  23)  demonstrate  that  the  HA 
planner  performed  best  at  the  minimal  complexity  level  (now  the  Moderate  case)  with 
performance  decreasing  as  complexity  increases,  in  direct  opposition  to  the  standard  as¬ 
sumptions  of  increased  perfonnance. 
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Figure  23.  Relative  performance  comparison  across  Scenarios  in  the  alternate  complexity 

ordering  (number  of  entities). 
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Additionally,  the  performance  of  the  HA  planner  in  adapting  to  complexity  is  also 
dependent  on  the  specific  deficiencies  noted  in  the  previous  sections  on  individual  test 
scenarios.  For  the  case  of  the  Simple  scenario,  the  most  likely  cause  of  deficient  per¬ 
formance  is  errors  in  data  logging  between  the  algorithm  and  the  simulation  environment 
for  the  acquisition  of  state  data.  This  is  not  a  direct  error  on  the  part  of  the  human  opera¬ 
tor  interacting  with  the  planning  algorithm,  the  latter  of  which  calculated  a  theoretically 
near-optimal  solution  based  on  faulty  state  information  and  modeling.  In  this  case,  a  tran¬ 
sient  delay  condition  was  treated  as  a  long-term  failure,  negating  the  assignment  of  tasks 
to  a  catapult.  If  this  modeling  assumption  was  corrected,  or  if  the  HO  planner  had  treated 
this  transient  condition  in  the  same  manner,  the  comparative  performance  across  com¬ 
plexity  levels  may  have  been  much  different.  The  same  can  be  said  for  the  Complex  con¬ 
dition,  where  the  system’s  modeling  of  the  interaction  between  landings  and  takeoffs  was 
shown  to  be  deficient.  For  this  scenario,  the  planner  did  not  include  a  model  for  the  inter¬ 
action  between  the  aft  catapults  and  the  landing  strip,  leading  to  an  inappropriate  assign¬ 
ment  of  resources  that  led  to  delays  in  landing  airborne  aircraft.  The  inclusion  of  a  model 
for  landing  strip-catapult  interaction  should  correct  the  failings  of  the  planner  in  handling 
joint  launch  and  recovery  cases.  However,  simply  instituting  the  design  changes  is  insuf¬ 
ficient;  a  second  testing  cycle  should  be  utilized,  as  described  by  the  Spiral  Model  in 
Figure  2.  This  round  of  testing  should  begin  with  the  current  scenarios  in  order  to  validate 
that  the  design  changes  properly  address  the  conditions  discussed  in  this  chapter.  Addi¬ 
tionally,  a  set  of  new  scenarios  with  new  and  different  content  should  be  created.  This 
may  allow  for  the  identification  of  additional  system  constraints  currently  unknown  to 
algorithm  designers.  Iteratively  expanding  the  testing  and  introducing  scenarios  with  new 
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content  will  allow  analysts  and  designers  to  continue  to  pinpoint  possible  refinements  in 
the  planning  algorithm  logic. 

5.4.  Chapter  Summary 

This  chapter  has  reviewed  the  testing  data  from  DCAP  simulation  testing,  covering 
statistical  analyses  of  the  data  and  relative  differences  between  planning  conditions.  The 
first  three  sections  reviewed  the  results  of  planner  performance  within  each  scenario,  with 
the  fourth  comparing  the  perfonnance  of  the  planners  across  complexity  levels.  Results 
from  the  scenarios  originally  designated  as  Simple  and  Complex  demonstrated  poor  per¬ 
formance  on  the  part  of  the  HA  system  and  provides  system  designers  with  points  of  in¬ 
vestigation  for  further  design  changes.  The  Moderate  scenario  demonstrated  that  the  HA 
planner  has  perfonnance  superior  to  the  HO  planner,  taking  similar  actions  in  replanning 
but  in  a  less  conservative  fashion.  In  this  case,  the  actions  of  the  HA  planner,  although 
they  did  not  completely  adhere  to  the  SME  heuristics,  resulted  in  overall  improvements  in 
global  measures  of  mission  performance. 

Examining  performance  across  scenarios  showed  that  the  HO  planner  was  capable  of 
adapting  to  increased  complexity  within  the  scenarios,  while  the  HA  planner  struggled 
with  both  the  Simple  and  Complex  scenarios.  This  runs  contrary  to  Fitts’  list  [10]  and  the 
common  belief  that  planning  algorithms  are  better  able  to  handle  complex  conditions. 
However,  this  may  be  due  to  the  previously  noted  errors  in  the  interfacing  between  the 
algorithm  and  the  simulation.  These  errors  resulted  in  the  HO  and  HA  planners  viewing 
the  state  of  the  world  in  different  manners.  If  the  state  information  used  by  the  automated 
planning  algorithm  had  been  more  accurately  modeled,  these  variations  in  performance 
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may  not  have  been  seen.  Furthermore,  as  shown  in  Chapter  5.3,  these  views  on  variation 
in  performance  are  influenced  by  both  the  selection  of  metrics  in  analyzing  performance 
and  the  definition  of  complexity  under  which  this  analysis  occurs.  By  using  only  common 
metrics  across  all  scenarios,  the  relative  perfonnance  characteristics  of  the  HO  and  HA 
planners  changed  slightly.  These  changes  showed  a  mix  of  increases  and  decreases  re¬ 
garding  the  relative  performance  of  planning  conditions.  Additionally,  an  alternate  defini¬ 
tion  of  complexity  provided  a  very  different  view  of  system  performance.  The  original 
definition  (based  on  SME  heuristics)  showed  the  HA  planner  as  having  an  inverted  “U” 
of  performance  with  respect  to  both  the  HO  and  B  planning  conditions.  An  alternate  defi¬ 
nition  of  complexity,  based  on  the  number  of  entities  affected  by  replanning  actions, 
showed  that  the  HA  planner  had  linearly  decreasing  perfonnance  as  the  level  of  complex¬ 
ity  increased. 
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6.  Conclusions  and  Future  Work 


The  goal  of  this  thesis  was  to  establish  what  metrics  and  methods  were  required  for 
the  validation  and  verification  of  a  P/RA  HSC  system  and  how  these  metrics  can  be  used 
in  the  iterative  systems  engineering  design  process.  This  requires  not  only  validating  the 
performance  of  the  individual  subcomponents  of  the  system  (the  human  operator  and  the 
automation),  but  also  validating  the  quality  of  interaction  between  these  subcomponents 
and  the  effectiveness  of  their  collaboration  in  forming  solutions.  These  measurements 
concern  two  steps  in  the  systems  engineering  spiral  model  depicted  in  Figure  2  -  the  In¬ 
tegration  and  Test  and  Acceptance  Test  stages.  In  addressing  the  Integration  and  Test 
step,  the  metrics  and  methodology  seek  to  measure  the  quality  of  integration  of  the  sub¬ 
components  (in  this  case,  the  human,  the  automated  algorithm,  and  the  display/control 
interface)  and  their  effects  on  one  another.  For  the  Acceptance  Test  step,  the  metrics  and 
testing  protocol  aid  in  the  characterization  of  overall  system  performance,  providing 
comparison  points  for  different  system  design  alternatives  or  in  regards  to  a  current  sys¬ 
tem. 

These  objectives  formed  the  basis  of  three  specific  research  objectives  for  this  thesis, 
which  are  reviewed  and  answered  in  the  next  subsection.  The  second  section  within  this 
chapter  highlights  the  limitations  and  future  work  related  to  this  thesis. 

6.1.  Research  Objectives  and  Findings 

This  thesis  sought  to  address  three  specific  research  questions  concerning  the  design 
and  implementation  of  metrics  and  a  testing  protocol  for  the  validation  of  P/RA  HSC  sys- 
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terns.  This  section  reviews  the  three  research  questions  posed  in  Chapter  2  and  details  the 
answers  fonnulated  in  the  course  of  this  research. 


What  metrics  are  required  for  evaluating  the  performance  of  a  Planning  and  Re¬ 
source  Allocation  Human  Supervisory  Control  system  as  compared  to  manual  planning? 

The  analysis  of  the  three  DCAP  scenarios  revealed  a  necessity  for  providing  both  de¬ 
scriptive  and  diagnostic  metrics  in  evaluating  the  performance  of  the  HA  planner  as  com¬ 
pared  to  the  HO,  manual  plan.  Descriptive  metrics  provide  comparison  points  between 
planning  conditions,  allowing  analysts  to  define  explicit  differences  in  perfonnance  be¬ 
tween  the  systems  (in  this  case,  different  planning  conditions).  In  the  case  of  DCAP, 
these  measures  were  defined  based  on  the  context  of  operations  within  the  system.  For 
supervisors  on  the  aircraft  carrier  deck,  safety  is  of  paramount  importance  and  included 
measures  such  as  the  number  of  Fuel  Violations  and  the  ability  to  recover  emergency  air¬ 
craft  quickly.  The  second  primary  measure  is  the  speed  of  operations,  specifically  total 
Mission  Duration,  but  additional  measures  addressing  the  time  spent  executing  certain 
subtasks  (e.g.  taxiing  on  deck  or  waiting  in  the  Marshal  Stack)  were  also  included.  The 
emphasis  on  these  two  classes  of  measure  may  not  be  appropriate  for  other  P/RA  systems 
(e.g.  airport  traffic  routing,  hospital  operating  room  allocation,  etc.)  and  may  result  not 
only  in  entirely  different  sets  of  metrics,  but  a  redefinition  of  priorities  of  individual  met¬ 
rics  and  metric  classes. 

Regardless  of  the  specific  nature  of  the  system,  the  inclusion  of  primary  metrics  as 
comparison  points  allows  analysts  to  quantify  the  variations  in  perfonnance  between 
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planning  conditions.  However,  these  metrics  may  not  allow  a  qualitative  understanding  of 
why  these  variations  occurred.  The  inclusion  of  diagnostic  measures,  such  as  subcompo¬ 
nent  task  times  (WTQMS,  TATT,  TAAT)  and  resource  allocation  metrics  (catapult 
launch  rates)  allows  analysts  to  understand  how  the  resulting  actions  of  the  planner  led  to 
undesirable  perfonnance.  In  the  DCAP  analysis,  the  launch  rate  metrics  for  the  Simple 
scenario  were  diagnostic  in  that  they  enabled  analysts  to  detennine  that  an  unbalanced 
distribution  of  launch  assignments  led  to  a  decrease  in  overall  launch  rate  and  an  increase 
in  final  Mission  Duration.  A  further  review  of  the  launch  assignments  then  revealed  an 
error  on  behalf  of  the  planning  algorithm  in  modeling  a  transient  failure  condition  on  the 
deck.  In  the  case  of  the  Complex  scenario,  catapult  launch  rates  were  again  used  as  diag¬ 
nostics  in  detennining  the  root  causes  of  poor  planner  perfonnance.  In  this  case,  however, 
the  launch  rates  revealed  a  significant  deficiency  in  the  algorithm’s  lack  of  modeling  of 
interactions  between  the  aft  catapults  and  the  landing  strip.  Although  the  algorithm  took 
actions  that  were  predicted  to  take  the  minimum  time  to  launch,  launches  did  not  occur 
quickly  enough.  This  resulted  in  conflicts  in  the  landing  strip  and  led  to  a  series  of  air¬ 
craft  waveoffs  on  landing. 


How  can  these  metrics  assess  the  variations  in  performance  of  human  and  combined 
human-algorithm  planning  agents? 

For  the  DCAP  system,  an  analysis  of  variations  in  performance  took  two  forms.  The 
first  addressed  variations  in  performance  for  all  planners  within  a  specific  scenario.  These 
comparisons  were  primarily  facilitated  by  the  usage  of  descriptive  measures  of  perform¬ 


ance,  applied  consistently  to  each  planner  within  each  trial.  Both  qualitative  and  statisti- 
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cal  analyses  were  utilized,  each  contributing  to  the  detennination  of  performance.  Statis¬ 
tical  analyses  aid  in  identifying  differences  within  data,  but  do  not  provide  guidance  in 
detennining  the  form  of  the  difference.  In  this  regard,  the  inclusion  of  qualitative  analy¬ 
ses  to  classify  the  type  of  difference  significantly  aided  in  distinguishing  cases  of  superior 
and  inferior  performance.  Only  by  including  the  additional  qualitative  review  of  differ¬ 
ences  in  measures  could  variations  in  performance  truly  be  depicted. 

However,  this  testing  protocol  was  purposefully  limited  in  its  ability  to  test  variations 
in  performance  with  respect  to  human  input  through  the  usage  of  a  single  Expert  User. 
The  scripting  of  this  user’s  actions  allowed  the  testing  program  to  highlight  specific  defi¬ 
ciencies  on  the  part  of  the  planning  algorithm  and  its  responses  to  varying  complexity 
levels  at  the  cost  of  investigating  system  performance  with  respect  to  variations  in  human 
input.  This  limits  the  generalizability  of  the  test  results  to  the  larger  class  of  potential  us¬ 
ers.  However,  if  this  testing  protocol  were  repeated  with  a  single  scenario  (fixing  the  test 
protocol  on  this  axis)  and  using  a  larger  pool  of  human  test  users,  a  similar  examination 
of  performance  could  be  performed.  In  this  manner,  the  testing  protocol  can  be  viewed  as 
existing  in  three  axes  -  operator,  algorithm,  and  scenario.  Comparing  the  performance  of 
any  condition  requires  a  fixing  of  at  least  two  of  these  axes.  In  the  current  testing,  the 
human  axis  was  fixed,  and  comparisons  occurred  by  either  fixing  on  the  scenario  axis 
(comparing  performance  between  planners)  or  the  planner  axis  (comparing  planner  per¬ 
formance  across  scenarios). 

The  utilization  of  realistic  scenarios  that  addressed  not  only  differing  complexity  lev¬ 
els,  but  also  different  planning  environments  (launch-only,  recovery-only,  and  mixed), 
allowed  the  identification  of  several  variations  in  planner  performance.  For  the  Simple 
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scenario,  a  deficiency  in  planning  operations  was  seen;  the  planning  algorithm  did  not 
properly  model  a  transient  failure  on  deck.  This  resulted  in  an  underutilization  of  re¬ 
sources  on  deck,  lowering  the  launch  rate  of  the  system  and  leading  to  an  increase  in 
overall  mission  duration.  In  the  Moderate  scenario,  similar,  yet  different,  performance 
was  seen  between  the  HO  and  HA  planners.  The  HA  planner  made  similar  adjustments  to 
the  two  emergency  aircraft  in  the  scenario,  but  did  not  move  aircraft  at  the  same  magni¬ 
tude  as  the  HO  planner.  However,  the  HA  planner’s  actions  resulted  in  lower  overall  Mis¬ 
sion  Duration  while  not  incurring  any  more  Fuel  or  Hydraulic  Fluid  Violations.  Lastly,  in 
the  Complex  scenario,  a  deficiency  different  from  that  of  the  Simple  case  was  seen.  Here, 
the  planning  algorithm  did  not  model  the  interactions  between  resources  on  the  deck, 
instead  selecting  to  launch  aircraft  from  the  aft  catapult  and  creating  conflicts  with 
incoming  aircraft  attempting  to  land.  While  these  actions  were  the  result  of  attempting  to 
launch  aircraft  in  the  minimum  amount  of  time,  which  may  be  optimal  for  many  cases, 
this  choice  of  action  was  suboptimal  in  this  scenario.  Without  the  inclusion  of  all  three 
cases,  some  significant  infonnation  regarding  the  performance  of  the  planners  (both  good 
and  bad)  would  not  have  been  discovered. 

This  testing  program  also  revealed  variations  in  the  performance  of  the  deck  envi¬ 
ronment  itself,  despite  the  standardization  of  user  replanning  actions  and  initial  scenario 
states.  For  instance,  HV-D  showed  a  large  distribution  for  the  Moderate,  Baseline  case 
while  having  a  relatively  small  distribution  for  the  Simple,  HO  case.  This  occurred  even 
though  the  scenario  utilized  the  same  initial  conditions  and  the  user  took  the  same  replan¬ 
ning  actions  in  every  trial.  This  variation  in  metric  values  was  also  seen  for  the  Baseline 
case,  where  the  user  took  no  replanning  actions  whatsoever.  The  dynamics  of  the  carrier 
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deck  environment  introduce  further  stochasticity  into  system  performance,  and  the  testing 
protocol  must  be  such  that  the  number  of  trials  performed  effectively  captures  this 
variability.  Failing  to  do  so  may  lead  the  data  to  exist  at  the  tails  of  the  distribution, 
leading  to  under-  or  over-predictions  of  performance.  This  may  then  lead  to  errors  in  the 
statistical  comparisons  and  incorrect  conclusions  on  planner  performance. 


How  can  these  metrics  predict  system  feasibility  and  highlight  design  flaws? 

The  prediction  of  system  feasibility  and  the  determination  of  design  flaws  is  a  qualita¬ 
tive  determination  on  the  part  of  the  analyst,  requiring  knowledge  of  both  the  dynamics 
of  the  environment  and  the  capabilities  of  the  planning  systems,  supplemented  by  data 
generated  in  the  testing  phase.  System  feasibility  is  best  served  by  descriptive  measures 
of  performance,  while  design  flaws  are  best  pinpointed  by  diagnostic  measures. 

Concerning  system  feasibility,  the  results  of  the  descriptive  performance  measures, 
tempered  by  the  analyst’s  understanding  of  system  dynamics,  provide  evidence  for  or 
against  system  acceptance.  The  inclusion  of  multiple  comparison  points  (in  this  case,  the 
HO  and  B  planning  conditions)  effectively  grounds  the  analyst  with  regards  to  expected 
system  performance.  The  results  from  the  Moderate  scenario  suggested  that  the  HA  plan¬ 
ner  is  indeed  capable  of  performing  as  well  as  the  HO  planner  in  addressing  failures,  al¬ 
beit  through  a  different  planning  strategy.  In  this  case,  the  HA  planner  took  actions  simi¬ 
lar  to  the  HO  planner  (making  a  similar  reorganization  of  the  landing  order)  but  ordered 
less  overall  movement  of  the  two  emergency  aircraft.  This  difference  in  strategy  was  re¬ 
vealed  through  the  use  a  variety  of  metrics  and  by  comparing  the  performance  of  the  sys- 
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tem  to  the  HO  planner.  The  results  from  the  Simple  scenario,  taken  at  face  value,  suggest 
that  the  HA  planner  was  completely  infeasible  for  use  in  this  manner.  However,  the  defi¬ 
nition  of  the  Simple  scenario  and  its  method  of  execution  implied  that  no  planner  would 
be  able  to  achieve  performance  equal  to  or  better  than  the  Baseline.  Without  this  under¬ 
standing  (either  foreknowledge  on  the  part  of  the  designer,  or  through  observing  the  per¬ 
formance  of  the  state-of-the-art  HO  planner),  this  scenario  would  have  demonstrated  that 
neither  the  HA  nor  the  HO  planner  is  adequate  for  use  in  the  real  world.  In  this  case,  the 
inclusion  of  the  HO  planner  as  a  comparison  point  grounded  expectations  for  current  sys¬ 
tem  performance. 

Design  flaws  are  highlighted  by  a  combination  of  the  analyst’s  understanding  of  the 
system  dynamics  and  the  application  of  diagnostic  measures  seen  in  the  system.  In  utiliz¬ 
ing  measures  as  diagnostics  of  perfonnance,  explanations  for  the  variation  in  perform¬ 
ance  can  be  constructed  and  can  guide  analysts  through  the  next  design  iteration.  Again 
reviewing  the  results  from  the  Simple  scenario,  the  diagnostic  measures  of  catapult 
launch  rate  revealed  a  possible  error  in  how  the  planner  viewed  the  availability  of  the 
catapults  (the  improper  modeling  of  a  transient  failure),  providing  a  specific  point  of  in¬ 
vestigation  for  the  designers.  This  inadequacy  is  likely  due  to  a  failure  in  properly  ad¬ 
dressing  the  state  information  being  acquired  from  the  simulation  data  and  is  likely  not 
due  to  actions  of  the  human  or  algorithm  specifically. 

However,  in  the  Complex  scenario,  the  use  of  diagnostic  measures  for  catapult  launch 

rates  revealed  a  possible  logical  flaw  in  the  algorithm’s  model  of  system  dynamics.  The 

model  used  by  the  algorithm  suggested  that  the  actions  in  the  proposed  schedule  would 

be  optimal,  providing  maximum  launch  rate  and  minimum  operational  time.  However, 
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the  execution  of  the  schedule  demonstrated  that  this  was  not  the  case,  providing  a  point  of 
investigation  for  the  design  team.  In  both  of  these  cases,  the  results  of  the  metrics  applied 
to  the  system  and  the  subsequent  statistical  and  qualitative  analyses  provided  evidence  to 
support  a  hypothesis.  The  analysts’  understanding  of  the  system  dynamics  and  the  func¬ 
tions  of  the  planning  algorithm  and  the  human  operator  then  frame  these  explanations. 

The  poor  performance  of  the  HA  planner  in  both  the  Simple  and  Complex  scenarios 
highlights  instances  of  brittleness  within  the  algorithm  and  its  data  collection  modules. 
Each  of  these  discoveries  was  highly  specific  to  the  case  presented  -  the  errors  in  state 
information  in  the  Simple  scenario  would  not  have  been  seen  if  crew  had  not  been  mov¬ 
ing  through  Catapult  2’s  area  one  minute  earlier  or  later.  Similarly,  the  Complex  scenario 
tested  a  unique  boundary  case  that,  while  a  realistic  and  important  test  case,  may  not  be 
typical.  The  majority  of  operations  are  similar  to  the  Simple  and  Moderate  cases,  neither 
of  which  would  have  revealed  the  failure  of  the  algorithm  to  compensate  for  conflicts  be¬ 
tween  the  aft  catapults  and  landing  strip.  For  the  DCAP  system,  the  utilization  of  a  vari¬ 
ety  of  testing  scenarios  incorporating  a  broad  range  of  possible  circumstances  aided  in 
revealing  specific  brittle  features  of  the  algorithm. 

6.2.  Limitations  and  Future  Work 

Although  the  presented  methodology  was  successful  in  defining  a  set  of  metrics  that 
allowed  analysts  to  define  the  performance  of  the  planners  and  to  detect  a  series  of  design 
changes  for  the  HA  planning  algorithm,  it  is  not  without  limitations.  The  definition  and 
use  of  large  numbers  of  metrics  (as  was  done  here)  can  be  time-consuming  and  expen¬ 
sive.  The  utilization  of  all  metrics  in  a  statistical  analysis  would  contribute  to  a  rapid  deg- 
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radation  of  family-wise  significance,  which  can  be  ameliorated  through  the  use  of  statis¬ 
tical  reduction  techniques  such  as  Principal  Components  Analysis.  However,  performing 
a  PCA  for  a  large  number  of  tests,  scenarios,  and  planning  conditions  requires  even  fur¬ 
ther  time  and  resources.  Additionally,  removing  metrics  based  on  strict  adherence  to  the 
PCA  may  lead  to  heterogeneous  data  sets;  that  is,  metrics  that  show  high  correlation  in 
one  test  scenario  may  not  show  correlated  in  another.  A  possible  extension  of  this  work  is 
improving  the  ability  to  identify  key  metrics  early  on  in  the  process,  preventing  excess 
data  collection  and  reducing  the  number  of  man-hours  spent  in  the  analytical  process. 

Additionally,  this  methodology  has  only  been  applied  to  a  single  P/RA  system  exam¬ 
ple,  using  a  single  deterministic  resource  allocation  algorithm  (an  Integer  Linear  Pro¬ 
gram),  operating  in  a  unique  environment  and  using  a  single  Expert  User  in  its  testing 
program.  These  are  all  very  specific  conditions  that  occurred  in  the  testing  of  this  system, 
and  the  generalizability  of  the  methodology  may  have  suffered  because  of  this. 

The  inclusion  of  only  one  single,  deterministic  algorithm  for  resource  allocation  lim¬ 
its  generalizability  to  pure  path  planning  systems  or  to  systems  utilizing  Heuristic  or 
Probabilistic  algorithms.  Currently,  ongoing  research  involves  the  creation  of  two  addi¬ 
tional  planning  algorithms  within  these  categories.  These  algorithms  are  of  different  for¬ 
mats  than  the  ILP  used  here  (both  are  non-deterministic)  and  have  different  strengths  and 
weaknesses  concerning  failure  handling  and  guarantees  of  optimality.  Future  research 
should  apply  this  methodology  to  this  pair  of  algorithms  to  determine  if  any  changes  in 
approach  are  needed  for  these  non-detenninistic  algorithms. 
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Additionally,  this  testing  program  used  a  single  Expert  User,  a  member  of  the  design 
team  intimately  familiar  with  the  interface.  For  the  purposes  of  algorithm  validation,  this 
is  an  acceptable  practice,  as  the  inclusion  of  this  user  minimizes  user  error  and  standard¬ 
izes  the  input  to  the  system.  The  lack  of  relative  “noise”  in  the  input  of  the  human  opera¬ 
tor  allows  for  a  direct  assessment  of  the  performance  of  the  algorithm.  While  this  limits 
the  confounds  that  would  have  arisen  from  utilizing  multiple  users,  much  future  work 
remains  to  be  done  to  address  the  use  of  the  DCAP  system  by  the  user  population  at  large 
and  in  realistic  circumstances. 

Despite  the  fidelity  of  the  simulation  environment,  it  is  difficult  to  precisely  recreate 
the  exact  conditions  under  which  these  decisions  are  made.  The  physical  environment, 
time  pressure,  and  stress  associated  with  operators  performing  these  actions  were  not  ad¬ 
dressed  in  this  testing  program.  Additionally,  the  final  implementation  of  this  P/RA  HSC 
system  will  most  likely  include  several  different  operators  with  various  skill  levels,  expe¬ 
riences,  and  different  planning  heuristics.  Before  accepting  the  system  for  final  imple¬ 
mentation,  a  larger  testing  program  including  wide  spectrum  of  potential  users  should  be 
performed  and  should  utilize  many  of  the  measures  negated  in  Chapter  4.2  by  the  inclu¬ 
sion  of  the  Expert  User.  This  testing  would  specifically  address  measures  of  Collabora¬ 
tion,  Human  Behavior  Precursors,  and  Human  Behavior  Efficiency.  This  testing  would 
also  address  the  performance  of  the  planner  and  the  system  as  a  while  across  multiple  us¬ 
ers. 


This  testing  protocol  cannot  truly  be  finalized  until  its  application  has  been  tested 
across  a  broad  range  and  depth  of  applicable  P/RA  HSC  systems.  While  this  testing  pro¬ 
tocol  has  proved  effective  in  allowing  analysts  to  determine  the  differences  between 
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manual  (HO)  and  combined  human-automation  (HA)  planning  for  the  DCAP  system,  the 
DCAP  system  was  a  unique  and  highly  specific  test  case.  A  great  deal  of  further  work 
remains  in  validating  and  verifying  this  methodology  across  a  larger  sample  size  of  P/RA 
HSC  systems,  algorithm  formats,  and  user  pools. 

6.3.  Thesis  Summary 

This  thesis  has  addressed  the  definition  of  metrics  and  a  testing  protocol  for  P/RA 
HSC  systems,  with  a  focus  on  validating  the  performance  of  the  combined  human- 
algorithm  planning  system.  This  thesis  began  with  a  review  of  an  example  HSC  metric 
framework  and  three  main  classes  of  automated  algorithms  used  in  P/RA  HSC  systems. 
This  thesis  then  reviewed  various  measures  associated  with  the  classes  defined  in  the 
metric  framework,  providing  examples  of  their  application  in  various  HSC  systems  or  in 
algorithm  validation  and  verification.  After  providing  background  on  an  example  P/RA 
HSC  system,  the  definition  of  specific  metrics  and  the  protocol  for  testing  the  DCAP  sys¬ 
tem  were  provided.  This  included  descriptions  of  three  testing  scenarios  across  varying 
levels  of  complexity  and  explanations  of  how  an  Expert  User  interacted  with  the  system 
during  the  testing  scenario.  The  results  of  system  testing  were  presented  and  discussions 
of  the  performance  of  the  various  planning  conditions  were  provided.  Discussions  of 
planner  performance  across  levels  of  complexity  were  also  offered.  Lastly,  the  ability  of 
this  work  to  address  three  main  research  objectives  was  discussed,  as  were  the  limitations 
of  and  future  work  related  to  this  research.  While  the  metrics  and  protocol  has  proved 
successful  in  this  application  to  this  specific  system,  there  is  no  guarantee  that  these  met¬ 
rics  and  the  testing  protocol  are  optimal  for  other  systems.  Repeated  application  of  this 


methodology  to  a  variety  of  other  P/RA  HSC  system  formats  and  algorithm  forms  will 
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provide  additional  insight  into  the  successes  and  limitations  of  the  metrics  and  protocol 
used  here. 


130 


Appendix  A  -  Gaussian  Processing  Times  for  DCAP 


Simulation 


Table  A.l.  List  of  Gaussian  processing  times  for  the  DCAP  simulation. 


Task 

Description 

Mean 

Standard  Deviation 

Fueling 

Fuel  flow  rate 

600  lb/min 

136  lb/min 

Takeoff 

Attach  Aircraft  to  Catapult 

1  minute 

15  second 

Accelerate  to  speed 

3.5  seconds 

0.5  seconds 

Landing 

Time  to  hit  wire  and  decelerate 

4  seconds 

1  second 
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Appendix  B  -  Hybrid  Cognitive  Task  Analysis  for 


DCAP 

The  goal  of  a  Hybrid  Cognitive  Task  Analysis  (hCTA)  is  the  creation  of  a  set  of  in¬ 
formation,  display,  and  functional  requirements  for  the  interface  of  a  complex  system, 
beginning  with  a  high-level  scenario  task  description  [113].  The  process  consists  of  five 
steps:  1)  a  Scenario  Task  Overview  (STO),  2)  a  set  of  Event  Flow  Diagrams  (EFDs),  3) 
generation  of  Situational  Awareness  Requirements  (SARs),  4)  generation  of  Decision 
Ladders  (DLs)  with  corresponding  display  requirements,  and  lastly  5)  generation  of  In¬ 
formation  and  Functional  Requirements  (IRs  and  FRs).  The  hCTA  process  is  often  used 
in  cases  where  the  system  being  designed  is  revolutionary  (in  that  it  has  no  prior  prede¬ 
cessors)  and  has  previously  been  used  in  the  design  of  control  interfaces  for  unmanned 
underwater  vehicles  (UUVs)  [132],  submarine  surface  collision  avoidance  [133],  and  in¬ 
teractive  scheduling  for  commuter  trains  [134].  The  following  sections  will  describe  the 
steps  used  in  creating  the  DCAP  hCTA,  beginning  with  the  Scenario  Task  Overview. 

B.l  Preliminary  Analysis 

B.l.l  Scenario  Task  Overview  (STO) 

The  Scenario  Task  Overview  decomposes  the  full  mission  definition  into  a  series  of 
phases  to  be  completed  by  the  human  operator.  In  most  cases,  this  results  in  a  relatively 
linear  flow  of  phases  from  one  task  to  the  next.  For  instance,  in  the  case  the  rail  schedul¬ 
ing  system  in  [134],  which  dealt  with  the  management  of  the  schedule  of  a  single  train 
traveling  between  two  points,  the  phases  were  broken  into  Pre-departure,  Departure,  En 
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Route,  and  Arrival  (tennination)  phases.  For  the  aircraft  carrier  operating  environment, 
the  operator  primarily  engages  in  two  roles  -  Monitoring  the  environment  and  Replan¬ 
ning  when  required.  In  this  case,  replanning  consists  of  reassigning  aircraft  to  either  dif¬ 
ferent  resources  or  altering  the  order  of  assignment  to  a  single  resource  (or  a  combination 
of  both).  At  this  time,  en  route  mission  and  strike  planning  are  not  included,  but  may  be 
included  in  future  work.  The  system  instead  focuses  on  the  replanning  of  tasks  only  for 
aircraft  in  the  immediate  vicinity  of  the  aircraft  carrier,  focusing  on  tasks  that  require  us¬ 
age  of  resources  on  deck.  Within  these  two  phases  of  operation,  a  total  of  19  subtasks 
were  defined  for  management  of  aircraft  carrier  deck  operations.  These  are  presented  in 
Table  B.  1  (Monitoring  phase)  and  Table  B.  2  (Replanning  phase). 


Table  B.  1.  Scenario  Task  Overview  -  Mission  phase  and  subtasks 


Task 

Number 

Related 

Phase 

Task 

EFD 

Symbol 

1 

Observe  crew  motion  on  deck 

LI 

2 

Issue  halt  to  operations  if  unsafe 

PI 

3 

Observe  operations  of  airborne  aircraft 

L2 

4 

Issue  alert  if  failure  occurs 

P2 

5 

Observe  state  of  deck  resources 

L3 

6 

Issue  alert  if  failure  occurs 

P2 

U 

© 

7 

Monitor  total  shift  time  of  the  crew 

L3 

*5 

8 

If  over  working  time,  document  and  account  for  in  future 

P3 

o 

personnel  scheduling 

9 

Monitor  total  shift  time  of  all  pilots 

L5 

10 

If  over  working  time,  document  and  account  in  future 

P4 

mission  scheduling 

11 

Monitor  status  of  the  schedule 

L6 

12 

If  schedule  degrades  due  to  delay  accumulation,  judge 
need  for  replan. 

P5 

13 

If  replan  need  arises,  initiate  replan 

D1 
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Table  B.  2.  Scenario  Task  Overview  -  Replanning  phase  and  subtasks. 


Phase 

Task 

Number 

Task 

Related  EFD 
Symbol 

1 

Detennine  what  item  has  failed 

P6 

2 

Determine  the  type/extent  of  failure 

P7 

c 

3 

Determine  affected  aircraft 

D2 

a 

a> 

4 

Reassign  tasks  for  affected  deck  aircraft 

D3 

PS 

5 

Reassign  landing  positions  for  affected 
airborne  aircraft 

D4 

6 

Communicate  schedule  changes  to  all  personnel 

P9 

B.1.2  Event  Flow  Diagrams  (EFDs) 

An  Event  Flow  Diagram  places  the  subtasks  that  comprise  the  STO  phases  into  a 
process  flow  diagram,  highlighting  the  temporal  constraints  and  relationships  between 
subtask  elements.  The  subtasks  are  broken  into  Processes,  Decisions,  Loops,  Phases,  and 
Assumptions  with  arrows  denoting  the  transition  between  elements  (Figure  B.  1).  These 
elements  are  then  assigned  alphanumeric  labels  for  traceability.  Thus,  a  future  Informa¬ 
tion  Requirement  (IR1)  can  be  linked  to  a  specific  Loop  (L3),  Decision  (D5),  or  Process 
(P12).  Three  different  EFDs  were  created,  totaling  six  Loops,  thirteen  Processes,  and 
seven  Decision  elements. 


Figure  B.  1.  Elements  used  in  Event  Flow  Diagrams. 
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The  first  EFD  (Figure  B.  2)  describes  the  operator  in  the  Monitoring  phase,  consisting 
of  a  set  of  concurrent  monitoring  Foops  (F1-F6)  and  a  single  Decision  element  regarding 
the  need  to  replan  (Dl).  If  this  decision  returns  a  “yes”  answer,  the  operator  moves  to  the 
replanning  phase.  The  second  EFD  (Figure  B.  3)  describes  the  basic  replanning  process 
of  operators  in  the  current  operating  paradigm.  Although  there  are  few  items  within  this 
diagram,  the  important  note  is  the  existence  of  three  decision-making  loops  (Decision 
blocks  contained  within  the  loop  symbol).  First,  for  all  i  aircraft  in  the  system,  the  opera¬ 
tor  must  decide  the  extent  to  which  a  failure  conditions  affects  each  aircraft.  This  pro¬ 
vides  the  operator  with  a  list  of  potential  aircraft  to  reschedule.  The  next  two  decision 
loops  separate  this  list  into  groups  of  j  deck  and  k  airborne  aircraft,  which  experience  dif¬ 
ferent  failures,  have  different  concerns,  and  different  methods  of  redress  given  the  failure 
in  the  system.  For  each  aircraft,  the  operator  must  determine  if  a  new  plan  must  be  gener¬ 
ated  for  the  aircraft.  Once  all  plans  are  generated,  these  are  transmitted  to  the  personnel 
on  deck  for  implementation  (P9). 

For  each  of  the  two  EFDs  presented  here,  Decision  Fadders  (DFs)  were  created  in  or¬ 
der  to  detennine  specific  infonnational  and  functional  requirements  for  the  decision¬ 
making  process.  These  DFs  also  guided  the  inclusion  of  the  automated  planning  system 
within  DCAP,  which  required  the  creation  of  an  additional  STO  Phase,  a  third  EFD,  and 
further  DFs,  all  of  which  are  detailed  in  the  following  sections. 
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Continuous  Monitoring 


Figure  B.  2.  Event  Flow  Diagram  1  -  Monitoring. 
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For  /  affected  deck 


For  k  affected  airborne 


aircraft 


aircraft 


1 


Figure  B.  3.  Event  Flow  Diagram  2  -  Replanning. 
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B.1.3  Decision  Ladders 


For  each  Decision  element  that  appears  in  a  given  EFD,  a  corresponding  Decision 
Ladder  (DL)  is  used  to  further  detail  the  knowledge  and  information  processing  tasks  re¬ 
quired  for  a  human  user  to  reach  the  decision  [135].  Decision  Ladders  begin  with  some 
form  of  alert  to  the  operator  (be  it  endogenous  or  exogenous)  and  delineate  the  specific 
tasks  and  infonnation  requirements  required  to  work  through  the  decision-making  proc¬ 
ess.  In  doing  so,  DLs  move  from  skill-based  to  knowledge-based  behavior  [136].  By 
tracking  the  steps  in  the  decision-making  process,  a  set  of  informational  requirements  can 
be  generated.  Infonnation  requirements  describe  individual  bits  of  infonnation  that  must 
be  presented  within  the  system  interface.  For  instance,  the  operator  may  need  to  replan  a 
schedule  due  to  the  emergence  of  a  failure.  The  operator  then  must  know  the  type  of  fail¬ 
ure  and  the  afflicted  aircraft  or  resource.  These  two  items  -  failure  type  and  affected  re¬ 
source  -  are  two  separate  Infonnation  Requirements  for  the  display.  Later  in  the  process, 
the  operator  may  need  to  submit  a  new  schedule  to  the  system.  A  Functional  Requirement 
would  then  be  a  control  item  that  allows  this  interaction.  A  total  of  four  DLs  were  created 
for  the  EFDs  in  Figure  B.  2  and  Figure  B.  3.  These  DLs  appear  in  Figure  B.  4  through 
Figure  B.  7. 
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Figure  B.  4.  Decision  Ladder  1  -  Necessity  of  Replanning  (from  EFD  1). 
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Figure  B.  5.  Decision  Ladder  2  -  Is  aircraft  affected  by  failure  (from  EFD  2). 
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Figure  B.  6.  Decision  Ladder  3  -  New  task  assignment  for  aircraft  j  (from  EFD  2). 
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Figure  B.  7.  Decision  Ladder  4  -  Define  landing  position  for  aircraft  k  (from  EFD  2). 
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In  examining  the  construction  of  the  STO  and  the  resulting  DLs,  it  was  identified  that 
a  automated  planning  algorithm  could  support  the  efforts  of  the  human  operator  in  accu¬ 
mulating  task  data  for  all  aircraft  in  the  system,  determining  the  effects  of  resource  fail¬ 
ures  on  these  aircraft,  and  in  judging  the  future  perfonnance  and  relative  effectiveness  of 
a  replanned  schedule.  The  inclusion  of  such  a  system  would  offload  much  of  the  decision 
loop  processes  in  the  Replanning  EFD  (Figure  B.  3),  but  would  fundamentally  change 
how  the  operator  interacts  with  the  system.  As  such,  a  third  STO  Phase  -  DCAP  Replan¬ 
ning  -  was  created,  with  a  subsequent  EFD  and  set  of  DFs.  These  are  detailed  in  the  sub¬ 
sequent  sections. 

B.2  Secondary  Analysis 

The  inclusion  of  an  automated  planning  algorithm  to  offload  operator  replanning 
tasks  changes  the  tasks  and  subtasks  that  an  operator  perfonns  during  the  replanning 
process.  Rather  than  creating  specific  schedules  for  aircraft,  the  operator  instead  manages 
inputs  to  and  guides  the  performance  of  an  automated  algorithm.  The  DCAP  system  was 
designed  to  allow  the  operator  with  two  levels  of  interaction,  on  both  global  and  local 
levels.  Prior  research  has  shown  that,  while  working  on  a  local  level  provides  better  per¬ 
formance  for  an  operator,  many  operators  attempt  to  manage  the  global  priorities  of  the 
system  [137,  138],  The  DCAP  system  utilizes  both  aspects,  allowing  users  to  rank  a  set  of 
personnel  group  variables  through  a  drag-and-drop  interface  [139]  to  apply  global  rank¬ 
ings,  while  allowing  users  to  specify  priority  levels  and  suggest  schedules  for  individual 
aircraft.  This  lead  to  the  creation  of  additional  STO  phases,  EFDs,  and  DFs. 
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B.2.1  Scenario  Task  Overview 


An  additional  STO  Phase  -  DCAP  Replanning  -  was  created  and  addresses  how  the 
operator  would  interact  with  the  system  in  order  to  input  these  global  and  local  planning 
constraints.  Table  B.  3  contains  a  list  of  the  subtask  elements  and  related  EFD  elements 
for  this  new  phase. 


Table  B.  3.  Scenario  Task  Overview  -  DCAP  Replanning  phase  and  subtasks 


Phase 

Task 

Task 

Related 

Number 

EFD  Symbol 

1 

Determine  what  item  has  failed 

P6 

2 

Determine  the  type/extent  of  failure 

P7 

25 

3 

Define  personnel  group  priorities 

P10 

jg 

Q. 

4 

Prioritize  and  suggest  schedules  for  specific  aircraft 

F7 

QJ 

PS 

5 

Select  aircraft 

Pll 

PLh 

< 

U 

Q 

6 

Detennine  priority  status  of  aircraft  i 

D5 

7 

Define  aircraft  as  priority 

P12 

8 

Determine  existence  of  desired  schedule 

D6 

9 

Define  suggested  schedule 

P13 

10 

Determine  acceptability  of  proposed  schedule 

D7 

B.2.2  Event  Flow  Diagrams 

An  additional  EFD  was  created  to  incorporate  the  subtasks  contained  with  the  new 
DCAP  Replanning  STO  phase.  This  appears  in  Figure  B.  8.  In  this  case,  only  one  major 
decision  loop  occurs,  in  which  operators  must  examine  all  i  aircraft  in  the  system  to  de¬ 
termine  if  any  aircraft  specifically  requires  a  new  schedule  suggestion. 
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(D7) 


Yes 


Monitoring  Phase 


Figure  B.  8.  Event  Flow  Diagram  3  -  DCAP  Replanning. 


B.2.3  Secondary  Decision  Ladders 


The  DCAP  Replanning  EFD  contains  three  new  Decision  elements,  addressing  how 
users  determine  the  priority  status  of  an  aircraft,  if  the  operator  has  a  suggested  schedule 
for  the  aircraft,  and  if  the  returned  schedule  proposed  is  acceptable.  The  DL  for  this  ele¬ 
ment  appears  in  Figure  B.  9. 


Figure  B.  9.  Decision  Ladder  5  -  Is  aircraft  i  a  priority  case  (from  EFD  3). 
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Figure  B.  10.  Decision  Ladder  6  -  Suggested  schedule  for  aircraft  i  (from  EFD  3). 
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Depiction  of  performance 
comparison  of  current/ 
proposed  schedules 


EVALUATE: 
Deficiencies  and 
areas  for 
improvement 


/  Ability  to  redress  \ 

f  Changes  in  schedules/ 

(  failure  and  overall  ) 

(  resource  allocation  to  ) 

V  performance  / 

V  improve  performance  / 

Depict  future  aircraft  tasks 
and  resource  allocations 


Depict  solution  for  failed 
aircraft;  changes  in  current 
and  future  resource 
allocation;  changes  in 
landing  order  and  LZ 
allocation 


EXTRAPOLATE: 
Resource  allocations 
and  future  temporal 
performance 


Highlight  infeasibilities  in 
schedule 


DETERMINE: 
ability  to  recover 
from  fai  lures, 
resource  allocation, 
and  acceptable 
landing  order 


ACTIVATION: 
Algorithm  has 
returned  a  proposed 
schedule 


INTERPRET: 
Overall  performance 
of  the  proposed 
schedule 


Schedule  acceptable 


Functions  accept, 
reject,  modify 
schedule 


EXECUTE:  Accept/ 
reject/  modify 
proposed  schedule 


Schedule 

unacceptable 


Figure  B.  1 1.  Decision  Ladder  7  -  Is  proposed  schedule  acceptable  (from  EFD  3). 
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B.2.4  Situational  Awareness  Requirements 


From  these  EFDs,  a  set  of  Situational  Awareness  Requirements  (SARs)  can  be  gener¬ 
ated,  describing  specific  bits  of  infonnation  that  must  be  displayed  to  the  operator  to  en¬ 
sure  their  awareness  of  the  current  operating  state  of  the  system.  A  total  of  29  SARs  were 
created  for  these  two  EFDs.  Because  the  third  EFD  ( DCAP  Replanning )  supersedes  the 
second  {Replanning),  its  SARs  do  not  appear  in  this  table. 


Table  B.  4.  List  of  Situational  Awareness  Requirements  (SARs). 


Level  1  (Perception) 

Level  2  (Comprehension) 

Level  3  (Projection) 

SARI 

Visual  depiction  of  crew 
on  deck  (LI) 

Location,  current  action,  and 
safety  of  crew  members  (LI) 

Destination  and  future 
safety  of  crew  members 
(LI) 

SAR2 

Visual  depiction  of  aircraft 
on  deck  (LI) 

Location,  current  action,  and 
safety  of  deck  aircraft  (LI) 

Destination  and  future 
safety  of  deck  aircraft  (LI) 

SAR3 

Visual  depiction  of  deck 
vehicles  (LI) 

Location,  current  action,  and 
safety  of  deck  vehicles  (LI) 

Destination  and  future 
safety  of  deck  vehicles 
(LI) 

SAR4 

Visual  depiction  of  air¬ 
borne  aircraft  (L2) 

Location,  current  action,  and 
safety  of  airborne  aircraft  (LI) 

Destination  and  future 
safety  of  airborne  aircraft 
(LI) 

SAR5 

Visual  depiction  of  current 
landing  order  (L2) 

Current  landing  order  (L2) 

- 

SAR6 

Visual  depictions  of  cur¬ 
rent  fuel  states  (L2) 

Fuel  levels  of  each  aircraft  (L2) 

Likelihood  of  aircraft  hav¬ 
ing  insufficient  fuel  (L2) 

SAR7 

Visual  depiction  of  cata¬ 
pults  and  status  (L3) 

Availability  and  operability  of 
catapults  (L2) 

Flexibility  of  launch  as¬ 
signments  (L2) 

SAR8 

Visual  depiction  of  status 
of  fuel  stations  (L3) 

Availability  and  operability  of 
fuel  stations  (L2) 

- 

SAR9 

Visual  depiction  of  status 
of  elevators  (L3) 

Availability  and  operability  of 
elevators  (L2) 

- 

SAR10 

Visual  depiction  of  status 
of  landing  strip  (L3) 

Availability  and  operability  of 
landing  strip  (L2) 

Repercussions  on  airborne 
aircraft  (L2) 

SAR11 

Visual  depiction  of  current 
time  of  deck  crew  (L4) 

Fatigue  level  of  the  crew  (L4) 

Time  remaining  before 
shift  end  (L4) 

SAR12 

Visual  depiction  of  current 
shift  time  of  pilots  (L5) 

Fatigue  level  of  the  pilot  (L4) 

- 
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Table  B.  5.  List  of  Situational  Awareness  Requirements  (SARs),  continued. 


Level  1  (Perception) 

Level  2  (Comprehension) 

Level  3  (Projection) 

SAR13 

Visual/auditory  alert  to 
new  aircraft  failures  (L2) 

Newly  developed  aircraft 
failure  (L2) 

Effects  of  failure  on  air¬ 
craft  schedule,  safety  (L2) 

SAR14 

Visual/auditory  alert  to 
new  resource  failures  (L3) 

Newly  developed  resource 
failure  (L3) 

Effects  of  failure  on  air¬ 
craft  schedule,  safety  (L2) 

SAR15 

Visual  depiction  of  current 
resource  allocations  (L6) 

Current  allocation  of  aircraft  to 
catapults/LZ  (L6) 

Bottlenecks  in  resource 
usage  (L6) 

SARI  6 

Visual  depiction  of  current 
aircraft  schedules  (L6) 

Current  schedule  of  operations 
and  level  of  delay  (L6) 

Effects  of  delays  on  over¬ 
all  operational  time  (L6) 

SARI  7 

Visual  depiction  of  current 
aircraft  failures  (P6) 

Currently  known  aircraft 
failures  (P6) 

Aggregate  effects  of  fail¬ 
ures  on  schedule  (P6) 

SAR18 

Visual  depiction  of  current 
resource  failures  (P6) 

Currently  known  resource 
failures  (P6) 

Aggregate  effects  of  fail¬ 
ures  on  schedule  (P6) 

SAR19 

List  of  personnel  group 
rankings  (P 1 0) 

Previous  rankings  of  personnel 
groups  (P 1 0) 

- 

SAR20 

Currently  selected  aircraft 
(Pll) 

Aircraft  currently  being  ranked 
(Pll) 

- 

SAR21 

Current  priority  status  of 
all  aircraft  (PI 2) 

Previous  priority  status  of 
aircraft  (PI 2) 

- 

B.3  Final  Information  and  Functional  Requirements 

The  final  step  in  the  hCTA  process  is  the  definition  of  a  set  of  Information  Require¬ 
ments  (IRs)  for  the  resulting  system  display.  These  come  jointly  from  the  SARs  and  De¬ 
cision  Ladders  developed  from  the  Event  Flow  Diagrams.  Table  B.  6  lists  the  IRs  for  the 
DCAP  interface;  requirements  are  linked  back  to  their  corresponding  Decision  Ladder  or 
EFD  element.  These  information  requirements  support  four  main  functions  of  the  DCAP 
System  -  Monitoring  the  state  of  the  world.  Alerting  the  user  to  failures  in  the  system, 
Predicting  the  future  performance  of  the  schedule,  and  Supporting  the  operator  in  replan¬ 
ning. 
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Table  B.  6.  Information  Requirements  for  the  DCAP  Interface. 


Inform  ation  Re qui rent  en  ts 

IR1 

Location  of  all  crew  on  deck 

SARI 

IR2 

Location  of  all  deck  aircraft 

SAR2 

IR3 

Location  of  all  airborne  aircraft 

SAR3 

IR4 

Location  of  all  deck  vehicles 

SAR4 

IR5 

Current  landing  order 

SAR5,  DL4 

IR6 

Current  fuel  states 

SAR6 

IR7 

Current  catapult  status  (available  and  operational) 

SAR7 

IR8 

Current  fuel  station  status  (available  and  operational) 

SAR8 

IR9 

Current  elevator  status  (available  and  operational) 

SAR9 

IR10 

Current  landing  strip  status  (available  and  operational) 

SAR10 

IR11 

Current  work  time  of  crew 

SARI  1 

IR12 

Current  work  time  of  pilots 

SARI  2 

IR13 

Alert  for  new  aircraft  failures 

SAR13,  DL1 

IR14 

Alert  for  new  resource  failures 

SARI 4,  DL1 

IR15 

Current  resource  allocation 

SAR15,  DL5,  DL6 

IR16 

Current  schedules  for  all  aircraft 

SARI 6,  DL5,  DL6 

IR17 

Currently  existing  aircraft  failures 

SARI  7 

IR18 

Currently  existing  resource  failures 

SARI  8 

IR19 

Current  personnel  group  rankings 

SARI  9 

IR20 

Currently  selected  aircraft 

SAR20 

IR21 

Current  priority  status  of  all  aircraft 

SAR21 

IR22 

Visual  depiction  of  failure  ID 

DL1,  DL5 

IR23 

Visual  depiction  of  failure  type 

DL1,  DL5 

IR24 

Visual  depiction  of  failure  details 

DL1,  DL5 

IR25 

Description  of  future  schedule 

DL1 

IR26 

Description  of  future  resource  allocations 

DL1 

IR27 

Visual  display  of  current  aircraft  task 

DL5,  DL6 

IR28 

Visual  display  of  aircraft's  upcoming  tasks 

DL5,  DL6 

IR29 

Visual  display  of  available  resources 

DL6 

IR30 

Visual  display  of  future  resource  allocations 

DL6 

IR31 

Visual  depiction  of  performance  under  proposed  assign¬ 
ment 

DL6 

IR32 

Project  aircraft  position  in  Marshal  Stack 

DL5,  DL6 

IR33 

Temporal  constraints  on  aircraft  failure 

DL5,  DL6 

IR34 

Visual/auditory  alert  to  return  of  proposal 

DL7 

IR35 

Visual  depiction  of  solution  for  failed  aircraft 

DL7 

IR36 

Visual  depiction  of  changes  in  resource  allocation 

DL7 

IR37 

Visual  depiction  of  changes  in  aircraft  schedules 

DL7 

IR38 

Visual  notation  of  points  of  infeasibility  (non-adherence) 
for  priority  aircraft 

DL7 

IR39 

Visual  depiction  of  predicted  aircraft  schedules 

DL7 

IR40 

Visual  depiction  of  predicted  resource  allocation 

DL7 

IR41 

Visual  depiction  relative  schedule  performance. 

DL7 
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Appendix  C  -  Tutorial  for  the  DCAP  Simulation 


!  1 2000  ft 
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Catapult  9 


Safety 


Current  Schedule 


Figure  C.  1.  Tutorial  Step  1  -  requesting  a  schedule. 

Step  1:  Click  the  Request  Schedule  button  in  the  upper  right  portion  of  the  screen. 
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Figure  C.  2.  Tutorial  Step  2  -  select  whether  to  change  variable  rankings. 


Step  2.  The  user  is  given  the  option  of  changing  the  rankings  in  the  Variable  Ranking 
Tool  (VRT).  Clicking  “Yes”  brings  this  window  to  the  interior  of  the  screen  and  makes  it 
actionable.  Clicking  “No”  skips  the  re-ranking  step  and  leaves  the  variables  with  their 
current  value.  The  user  also  has  to  option  to  permanently  skip  this  step  through  a 
checkbox  at  the  bottom  of  the  frame. 
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Aircraft  Type 
Time  on  Target 
Warning  Status 


Timeline  Legend 


Figure  C.  3.  Tutorial  Step  3  -  changing  variable  rankings. 


Step  3.  If  users  decide  to  re-rank  the  four  major  personnel  groups,  they  simply  click  and 
drag  icons  within  the  five  levels  in  the  VRT  frame.  Variables  can  be  ranked  in  any  man¬ 
ner  within  the  five  levels  provided  -  all  on  one  level,  each  on  a  separate  level,  and  any 
combination  in  between.  When  users  are  satisfied  with  the  rankings,  they  click  the  “Sub¬ 
mit”  button  at  the  bottom  of  the  screen  to  submit  the  rankings  to  the  algorithm. 
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Figure  C.  4.  Tutorial  Step  4-  defining  aircraft  priorities. 


Step  4.  Clicking  “Request  Schedule”  causes  a  set  of  checkboxes  to  appear  next  to  aircraft 
in  the  Aircraft  Schedule  Panel  (ASP).  Checking  one  of  these  boxes  designates  the  associ¬ 
ated  aircraft  as  “priority”  to  the  algorithm.  This  is  a  binary  condition  -  aircraft  are  either 
priority  cases,  or  they  are  not.  Users  may  assign  priority  to  all  aircraft  or  to  none. 
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Figure  C.  5.  Tutorial  Step  5  -  suggesting  aircraft  schedules. 


Step  5.  Defining  an  aircraft  as  priority  leads  to  an  additional  change  in  the  ASP  -  the 
timeline  for  the  associated  aircraft  splits  horizontally  into  two  segments.  The  upper  seg¬ 
ment  depicts  the  current  operating  schedule  of  the  aircraft.  The  bottom  half  of  the  bar  will 
be  used  to  submit  the  operator’s  desired  schedule  for  this  aircraft  to  the  algorithm.  When 
the  timeline  is  split,  the  bottom  bar  becomes  actionable  and  can  be  dragged  left  or  right  to 
accelerate  or  postpone  the  aircraft’s  schedule  of  tasks.  In  certain  cases,  individual  tasks 
can  be  lengthened  or  shortened  by  changing  the  size  of  the  associated  color  block.  When 
users  have  completed  their  specification,  they  press  the  “Submit”  button  above  the  first 
aircraft  in  the  list. 
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Figure  C.  6.  Tutorial  Step  6  -  review  the  proposed  schedule. 


Step  6.  After  pressing  both  “Submit”  buttons  (one  each  in  the  ASP  and  DVT),  the  algo¬ 
rithm  processes  the  user  inputs  in  light  of  the  current  state  of  operations  on  deck,  creating 
a  schedule  proposal.  This  proposal  is  displayed  to  the  operator  through  modifications  of 
the  ASP,  the  Deck  Resource  Timeline  (DRT),  and  the  appearance  of  the  Disruption  Visu¬ 
alization  Tool  (DVT).  The  ASP  and  DRT  modifications  display  the  current  schedule  on 
top  and  the  proposed  schedule  on  bottom  (akin  to  the  method  of  suggesting  aircraft 
schedules  in  the  previous  step),  allowing  users  to  make  a  one-to-one  comparison  of 
changes  in  the  schedules.  The  DVT  shows  the  relative  effectiveness  of  the  schedule  over¬ 
all.  Green  triangles  signify  that  all  tasks  for  a  variable  group  are  completed  in  less  operat¬ 
ing  time,  while  red  triangles  signify  that  the  new  schedule  demands  more  operating  time 
of  this  group.  These  states  are  reinforced  by  the  size  of  the  triangle  as  determined  by  its 
edge  distance  from  the  dashed  black  line.  This  line  depicts  no  change  between  the  sched¬ 
ules,  such  that  green  triangles  appear  within  the  dashed  line,  and  red  triangles  appear  out¬ 
side  it. 
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Figure  C.  7.  Tutorial  Step  7  -  adjust  proposed  schedule. 


Step  7.  The  user  retains  the  ability  to  make  further  suggestions  to  the  scheduler  at  this 
juncture.  The  ASP  timeline  bars  remain  actionable,  giving  users  the  option  to  attempt  to 
fine-tune  the  proposed  schedule  before  implementation.  Doing  so,  however,  invalidates 
the  current  schedule  and  forces  the  algorithm  to  perform  a  new  round  of  scheduling  cal¬ 
culations.  If  changes  are  made,  the  user  would  again  press  “Submit”  to  initiate  schedule 
generation. 
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Figure  C.  8.  Tutorial  Step  8  -  accepting  the  proposed  schedule. 


Step  8.  When  users  have  a  desirable  schedule,  they  press  “Accept”  in  the  upper  right  cor¬ 
ner  of  the  interface  to  implement  the  schedule  into  the  environment. 
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Figure  C.  9.  Tutorial  Step  1  -  returning  to  a  monitoring  state. 


Step  9.  Pressing  the  “Accept”  button  then  returns  the  operator  to  the  monitoring  state  un¬ 
til  a  need  to  request  a  new  schedule  arises. 


161 


162 


Appendix  D  -  Expert  User  Replanning  Actions 

D.l  Replanning  Actions  for  the  Expert  User 


As  noted  previously,  in  a  further  effort  to  reduce  variability  in  the  system  so  as  to  ac¬ 
curately  depict  the  performance  of  the  algorithm,  the  replanning  actions  of  the  Expert 
User  were  standardized  according  to  the  details  in  each  scenario.  The  actions  taken  by  the 
Expert  User  were  dictated  by  the  application  of  the  expert  user  heuristics  discussed  in 
Chapter  4.1.1.  The  following  sections  describe  the  standard  actions  that  were  taken  dur¬ 
ing  replanning  for  each  scenario. 


D.1.1  Planning  for  the  Simple  Scenario 

In  performing  manual  planning  for  the  Simple  scenario,  SME  heuristics  1,  2,  4,  and  6 
were  applied.  This  does  not  imply,  however,  that  changes  to  other  aircraft  schedules  did 
not  occur.  Table  D.l  lists  the  aircraft  whose  new  schedules  resulted  in  catapult  reassign¬ 
ments  and  the  order  in  which  replanning  occurred.  Table  D.2  lists  the  new  assignments 
list  with  the  desired  ordering  of  launches;  asterisks  (*)  denote  aircraft  whose  assignment 
was  changed. 


Table  D.l.  List  of  aircraft  with  catapult  reassignments. 


Original 

Assignment 

New 

Assignment 

SMAC  #2 

3 

2 

FMAC  #2 

3 

2 

FMAC  #5 

3 

2 

FMAC  #8 

3 

2 

FMAC  #11 

3 

4 

FUAV  #2 

3 

4 

FUAV  #4 

2 

4 
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Table  D.2.  Schedule  assignment  after  manual  replanning 
asterisks  (*)  denote  aircraft  whose  assignment  was  changed. 


Catapult  1 

Catapult  2 

Catapult  3 

Catapult  4 

Disabled 

- 

SUAV  #2 

SUAV  #1 

- 

Disabled 

SMAC  #1 

SMAC  #2* 

FMAC  #1 

FMAC  #2* 

FMAC  #4 

FMAC  #3 

FMAC  #7 

FMAC  #5* 

FMAC  #10 

FMAC  #6 

FMAC  #11* 

FMAC  #8* 

FUAV  #1 

FMAC  #9 

FUAV  #2* 

FMAC  #12 

FUAV  #4* 

FUAV  #3 

- 

In  performing  the  manual,  human-only  replanning,  the  user  clicked  on  the  aircraft, 
pressed  a  trigger  key  (the  FI  key)  on  the  computer  keyboard,  and  then  clicked  on  the  des¬ 
tination  catapult.  After  making  the  assignment  to  the  new  catapult,  the  aircraft  immedi¬ 
ately  implemented  the  new  schedule  and  began  moving  to  its  new  destination.  After 
completing  all  reassignments,  the  user  pressed  a  second,  different  trigger  key  (F 10)  on 
the  computer  keyboard  signaled  the  end  of  replanning.  After  this,  no  further  user  action 
was  required  and  the  scenario  could  run  to  completion. 

In  perfonning  the  automated  planning  for  this  Simple  scenario,  the  user  first  clicked 
the  “Request  Schedule”  button  in  the  upper  right  hand  corner  of  the  screen.  Given  that 
this  scenario  is  a  launch  configuration,  the  variable  rankings  were  set  to  assign  Deck  Air¬ 
craft  as  the  highest  priority,  Crew  Working  and  Deck  Support  vehicles  as  medium  prior¬ 
ity,  and  Airborne  Aircraft  (none  in  the  system)  as  the  lowest  priority.  In  the  Aircraft 
Schedule  Panel,  the  same  aircraft  that  were  given  schedule  changes  in  the  manual  replan¬ 
ning  case  (Table  D.2)  were  given  priority  designations.  However,  no  schedule  changes 
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were  suggested;  the  schedules  were  left  in  their  current  state,  signaling  to  the  scheduler  a 
desire  to  adhere  to  the  current  launch  time  schedule.  In  the  returned  schedule,  shifts  for¬ 
ward  were  acceptable,  but  movements  backward  in  time  would  not  be  accepted.  After 
receiving  the  new  plan,  the  user  reviewed  the  schedules  for  each  of  these  aircraft,  looking 
to  see  if  the  new  schedule  satisfied  these  criteria. 

D.1.2  Planning  for  the  Moderate  Scenario 

In  the  Moderate  Scenario,  only  two  aircraft  required  replanning.  The  first  aircraft  was 
a  Fast  Manned  Aircraft  that  encountered  a  hydraulic  leak,  the  second  a  Slow  Manned 
Aircraft  that  encountered  a  fuel  leak.  In  replanning  for  this  scenario,  five  SME  heuristics 
were  applied  (1,  2,  3,  7,  and  9).  The  main  emphasis  is  on  Heuristic  9  and  the  differentia¬ 
tion  between  True  and  Urgent  emergencies.  Fuel  leaks  are  considered  a  major,  True 
emergency  that  must  be  handled  immediately.  The  hydraulic  leaks,  as  modeled  in  the  sys¬ 
tem,  develop  more  slowly  and  are  considered  to  be  less  problematic.  Furthermore,  hy¬ 
draulic  failures  increase  the  likelihood  of  accidents  and  debris  at  landing,  which  may  fur¬ 
ther  limit  the  ability  of  the  deck  to  recover  aircraft  and  create  more  failures  in  the  system. 

For  the  manual,  human-only  planning  condition,  this  resulted  in  moving  the  fuel  leak 
aircraft  forward  in  the  Marshal  Stack,  assigning  it  the  most  immediate  landing  slot,  and 
moving  the  hydraulic  failure  backwards  to  the  last  manned  position  in  the  Marshal  Stack. 
This  preserves  Heuristic  3  (Safety  of  Pilots  and  Crew).  In  executing  these  actions,  the 
user  again  clicked  the  aircraft,  pressed  FI,  and  then  clicked  either  the  deck  (to  issue  an 
immediate  landing  order)  or  an  alternate  area  signifying  a  move  to  the  back  of  the  Mar- 
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shal  Stack.  The  user  again  pressed  F10  to  signify  the  conclusion  of  the  planning  activi¬ 
ties. 

For  the  human-algorithm  combined  planning,  group  variables  were  set  opposite  to 
that  of  the  Simple  scenario.  Crew  Working  and  Deck  Support  vehicles  still  received  a 
moderate  rating,  but  the  Airborne  Aircraft  were  now  of  primary  concern.  Deck  Aircraft 
were  moved  to  the  lowest  setting,  as  there  are  none  in  the  system.  In  the  Aircraft  Sched¬ 
ule  Panel,  the  two  failed  aircraft  were  assigned  priority  ratings,  but  in  this  case,  the  Ex¬ 
pert  User  provided  suggestions  for  the  schedules  for  these  aircraft.  The  SMAC  with  the 
fuel  failure  is  given  a  suggestion  to  move  forward  in  time,  while  the  FMAC  with  the  hy¬ 
draulic  failure  is  suggested  to  delay  operations.  After  receiving  the  proposed  schedule, 
the  user  reviewed  the  aircraft  schedules  in  order  to  ensure  that  the  SMAC  is  moved  for¬ 
ward  in  time,  as  this  is  the  primary  concern.  The  backward  movement  of  the  FMAC  is  a 
secondary  concern,  as  the  suggested  schedule  may  actually  induce  a  limit  violation  and 
be  unacceptable  for  the  planning  algorithm. 

D.1.3  Planning  for  the  Complex  Scenario 

The  Complex  scenario  includes  aspects  of  both  the  Simple  and  Complex  cases,  and 
thus  replanning  included  all  of  the  above  Heuristics  (1,  2,  3,  4,  6,  7,  and  9).  For  the  man¬ 
ual  case,  application  of  these  heuristics  resulted  in  moving  the  SMAC  with  fuel  leak  for¬ 
ward  in  the  Marshal  Stack,  inserting  it  into  the  first  available  landing  configuration. 
However,  planning  for  the  aircraft  on  deck  required  ensuring  that  these  aircraft  were  not 
sent  to  Catapults  3  and  4.  If  one  of  these  aircraft  were  to  become  incapacitated  in  the 
Landing  Strip,  the  SMAC  would  be  placed  in  serious  danger.  As  such,  both  of  there  air- 
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craft  were  sent  to  Catapult  2,  ensuring  that  no  conflicts  were  created  with  the  landing 
strip.  The  physical  actions  to  do  this  are  identical  to  those  used  for  the  deck  and  airborne 
aircraft  in  the  previous  two  scenarios. 

For  the  human-algorithm  case,  the  strategy  was  similar  to  the  Moderate  case.  Variable 
rankings  were  slightly  adjusted  due  to  the  lower  fuel  levels  of  aircraft  in  this  scenario; 
while  Airborne  Aircraft  retained  the  highest  priority,  all  other  variables  were  placed  in 
the  lowest  priority  bin.  This  placed  even  greater  emphasis  on  the  critical  nature  of  the  fuel 
states  of  these  aircraft.  The  failed  aircraft  was  given  a  priority  designation  in  the  ASP 
with  suggestion  to  accelerate  the  schedule.  When  reviewing  the  proposed  plan,  the  user 
examined  individual  aircraft  schedules  to  ensure  that  the  failed  SMAC  was  given  a 
schedule  that  moved  it  forward  in  time. 
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Appendix  E  -  Metrics  with  Low  External  Validity 
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Correlations  Transformed  Variables 


TATT 

TAAT 

TCAT 

TUAT 

TAT 

WTC 

WTCrew 

PHV 

PHV  D 

SHV 

SHV  D 

MD 

C2LR 

C3LR 

C4LR 

TCLR 

LZFT 

TATT 

1.000 

.635 

-.089 

.059 

.652 

.788 

.170 

.343 

.562 

.480 

.566 

.770 

-.570 

-.782 

-.782 

-.723 

.750 

TAAT 

.635 

1.000 

.266 

.401 

.939 

.499 

.470 

.070 

.591 

.142 

.310 

.740 

-.723 

-.605 

-.605 

-.484 

.741 

TCAT 

-.089 

.266 

1.000 

.330 

.219 

-.212 

.342 

-.488 

.257 

-.587 

-.006 

.079 

-.116 

.141 

.141 

.196 

.041 

TUAT 

.059 

.401 

.330 

1.000 

.496 

-.317 

.899 

-.275 

.567 

-.314 

-.010 

.046 

-.126 

.144 

.144 

.213 

.012 

TAT 

.652 

.939 

.219 

.496 

1.000 

.534 

.514 

.131 

.696 

.181 

.373 

.766 

-.752 

-.628 

-.628 

-.526 

.777 

WTC 

.788 

.499 

-.212 

-.317 

.534 

1.000 

-.281 

.394 

.349 

.571 

.496 

.847 

-.669 

-.959 

-.959 

-.959 

.871 

WTCrew 

.170 

.470 

.342 

.899 

.514 

-.281 

1.000 

-.305 

.495 

-.323 

-.011 

.069 

-.087 

.102 

.102 

.203 

.018 

PHV3 

.343 

.070 

-.488 

-.275 

.131 

.394 

-.305 

1.000 

-.027 

.827 

.156 

.151 

-.227 

-.296 

-.296 

-.286 

.274 

PHV_D 

.562 

.591 

.257 

.567 

.696 

.349 

.495 

-.027 

1.000 

.114 

.587 

.572 

-.576 

-.436 

-.436 

-.381 

.554 

SHV 

.480 

.142 

-.587 

-.314 

.181 

.571 

-.323 

.827 

.114 

1.000 

.383 

.353 

-.340 

-.503 

-.503 

-.523 

.420 

SHV_D 

.566 

.310 

-.006 

-.010 

.373 

.496 

-.011 

.156 

.587 

.383 

1.000 

.530 

-.406 

-.466 

-.466 

-.483 

.506 

MD 

.770 

.740 

.079 

.046 

.766 

.847 

.069 

.151 

.572 

.353 

.530 

1.000 

-.845 

-.901 

-.901 

-.850 

.971 

C2LR 

-.570 

-.723 

-.116 

-.126 

-.752 

-.669 

-.087 

-.227 

-.576 

-.340 

-.406 

-.845 

1.000 

.743 

.743 

.694 

-.907 

C3LR 

-.782 

-.605 

.141 

.144 

-.628 

-.959 

.102 

-.296 

-.436 

-.503 

-.466 

-.901 

.743 

1.000 

1.000 

.972 

-.921 

C4LR 

-.782 

-.605 

.141 

.144 

-.628 

-.959 

.102 

-.296 

-.436 

-.503 

-.466 

-.901 

.743 

1.000 

1.000 

.972 

-.921 

TCLR 

-.723 

-.484 

.196 

.213 

-.526 

-.959 

.203 

-.286 

-.381 

-.523 

-.483 

-.850 

.694 

.972 

.972 

1.000 

-.871 

LZFT 

.750 

.741 

.041 

.012 

.777 

.871 

.018 

.274 

.554 

.420 

.506 

.971 

-.907 

-.921 

-.921 

-.871 

1.000 

Dimension 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

Eigenvalue 

9.025 

3.679 

1.403 

.911 

.638 

.421 

.291 

.168 

.148 

.122 

.074 

.055 

.038 

.016 

.006 

.005 

.000 

T3 

o 
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VI 
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Appendix  F  -  Principal  Components  Analysis  Results 


Table  F.  2.  PCA  Results  for  the  Simple  scenario,  Human-Only  planning  condition. 


Correlations  Transformed  Variables 


EZS53 

Ki 

asm 

lll/JU 

ESI 

iron 

[2in 

XU! 

msp1»' 

lim 

Pin 

ITT1 

FI 

K71 

K!l;l 

>WI:1 

WT3 

■KOI* 

.956 

.735 

-.026 

.955 

.806 

■fcfcT: 

.159 

-.498 

.693 

.208 

.682 

.221 

.959 

Tist 

-.498 

-.694 

.813 

-.806 

.460 

-.917 

.788 

TAAT5 

.959 

.677 

.151 

.998 

.765 

.952 

.288 

-.495 

.072 

-.187 

-.495 

-.662 

.766 

-.759 

.414 

-.964 

.755 

TCAT 

.735 

.677 

1.000 

-.050 

.677 

.685 

.740 

.172 

-.287 

.598 

.554 

.107 

.677 

-.192 

-.287 

-.392 

-.904 

.694 

-.690 

.173 

--58C 

.627 

rUAT 

.151 

-.050 

.167 

-.025 

.924 

.096 

-.295 

-.368 

.151 

-.438 

.096 

-.084 

.054 

-.149 

-.133 

-.061 

TATa 

.955 

.998 

.677 

.167 

.765 

.948 

.303 

-.478 

.636 

.068 

.998 

-.198 

-.478 

-.599 

-.659 

.766 

-.759 

.416 

-.962 

.757 

WTQMS 

.806 

.765 

.685 

-.045 

.765 

.801 

-.555 

.976 

.212 

.958 

.223 

.765 

-.161 

-.555 

-.557 

.999 

-.999 

.790 

-.731 

99C 

WTC3 

.998 

.952 

.740 

-.025 

.948 

.801 

1.000 

.169 

-.481 

.685 

.192 

.675 

.952 

-.  1 5C 

-.481 

-.612 

-.695 

-.802 

.456 

-.910 

.779 

WTCrew 

.159 

.288 

.172 

.924 

.303 

.14C 

.169 

1.000 

.025 

.086 

-.254 

-.319 

-.506 

.025 

-.238 

.140 

-.139 

-.017 

-.239 

.104 

WTI 

-.498 

-.495 

-.287 

.096 

-.478 

-.555 

-.481 

.025 

-.511 

-.220 

-.540 

-.248 

-.495 

.246 

.75C 

.242 

-.545 

.545 

-.459 

.461 

-.584 

PHV 

.693 

.633 

.598 

-.091 

.636 

.976 

.685 

.086 

-.511 

.271 

.990 

.279 

.633 

-.165 

-.511 

-.896 

-.442 

.975 

-.977 

.849 

-.611 

.976 

PHV  D 

.208 

.055 

.122 

-.295 

.051 

.212 

.192 

-.254 

-.220 

.271 

.332 

.959 

.055 

.086 

-.220 

-.224 

-.106 

.223 

-.223 

.218 

-.051 

SHV 

.605 

.554 

-.170 

.604 

.958 

.675 

.006 

-.540 

.990 

.333 

.605 

-.136 

-.540 

-.908 

-.377 

.957 

-.959 

.877 

-.59C 

.965 

SHV  D 

.221 

.072 

.107 

-.368 

.068 

.223 

.202 

-.319 

-.248 

.279 

.959 

.333 

.072 

.119 

-.248 

-.241 

-.111 

.234 

-.234 

.228 

-.075 

IVDa 

.959 

.677 

.151 

.998 

.765 

952 

.288 

-.495 

.633 

.072 

-.187 

-.495 

-.662 

.766 

-.759 

.414 

-.964 

.755 

UCD 

-.150 

-.187 

-.192 

-.438 

-.198 

-.161 

-.506 

.246 

-.165 

-.136 

.119 

-.187 

.246 

.139 

-.167 

.166 

-.081 

.140 

-.149 

UIT 

-.498 

-.495 

-.287 

.096 

-.478 

-.555 

-.481 

-.511 

-.248 

-.495 

.246 

.75C 

.242 

-.545 

.545 

-.459 

.461 

-.584 

UU 

-.392 

.099 

-.599 

-.888 

-.612 

-.038 

.75C 

-.896 

-.224 

-.241 

.75C 

.221 

-.883 

.885 

-.893 

.604 

-.919 

C2La 

-.694 

-.662 

-.904 

-.084 

-.659 

-.557 

-.695 

-.238 

.242 

-.442 

-.106 

-.377 

-.111 

-.662 

.139 

.242 

.221 

-.562 

.557 

.060 

.549 

-.489 

C4L 

.813 

.766 

.694 

-.053 

.766 

.999 

.809 

.140 

-.545 

.975 

.957 

.234 

.766 

-.167 

-.545 

-.883 

-.562 

.790 

-.731 

.989 

C2LR 

-.806 

-.759 

-.690 

.054 

-.759 

-.999 

-.802 

-.139 

.545 

-.977 

-.959 

-.234 

-.759 

.166 

.545 

.885 

.557 

-.793 

.724 

-.989 

C4LRa 

,46C 

.414 

.173 

-.149 

.416 

79C 

.456 

-.017 

-.459 

.845 

.218 

.877 

.414 

-.459 

-.893 

.060 

-.793 

-.454 

.826 

7CLRa 

-.917 

-.964 

-.580 

-.133 

-.962 

-.731 

-.239 

.461 

-.611 

-.590 

-.075 

-.964 

.  14C 

.461 

.604 

.549 

-.731 

.724 

-.454 

-.736 

LZFT 

.788 

.755 

.627 

.757 

99C 

.779 

-.584 

.976 

.965 

.283 

.755 

-.149 

-.584 

-.919 

-.489 

.989 

-.989 

.826 

-.735 

Dimension 

1 

2 

3 

A 

5 

6 

7 

8 

9 

1C 

11 

m 

mg 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

Eigenvalue 

13.20 

1 

3.291 

1.428 

1.301 

.918 

.517 

.111 

.073 

m 

m 

— 1 
U> 


Table  F.  3.  PC  A  Results  for  the  Simple  scenario,  Human- Algorithm  planning  condition. 


Table  F.  4.  Cross-Correlations  for  the  Simple  Scenario. 


Simple  Scenario 


TATT 

TAAT 

TUAT 

TAT 

WTC 

WTCrew 

PHV 

SHV 

MD 

C2LR 

C4LR 

TCLR 

LZFT 

WTI 

UIT 

UU 

TATT 

X 

TAAT 

X 

TUAT 

X 

TAT 

X 

X 

WTC 

X 

X 

WTCrew 

X 

X 

PHVa 

X 

SHV 

X 

X 

MD 

X 

X 

X 

X 

X 

C2LR 

X 

X 

X 

X 

C4LR 

X 

X 

TCLR 

X 

X 

X 

X 

X 

LZFT 

X 

X 

X 

X 

X 

X 

X 

X 

X 

WT1 

X 

UIT 

X 

X 

UU 

X 

X 

X 

In  this  table,  X’s  denote  that  these  metrics  were  highly  correlated  for  all  the  planning 
conditions  (Table  F.  1  -  Table  F.  3).  For  example,  TATT  and  MD  were  found  to  be 
highly  correlated  with  the  B,  FIO,  and  FIA  planning  conditions  within  the  Simple  sce¬ 
nario.  TATT  and  TAAT,  however,  were  not  highly  correlated  in  at  least  one  of  these 
three  planning  conditions. 
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?d  Variables 


MAC  6  EART 

FMAC  6  HER 

FMAC  6  AAT 

SMAC  2  EART 

SMAC  2  EFR 

SMAC  2  AAT 

-.353 

.311 

-.461 

.487 

.463 

-.311 

.513 

.528 

-.360 

.345 

.310 

-.097 

.105 

-.412 

.473 

.587 

-.053 

.172 

.567 

-.012 

.•cw'Mprrvf 

.342 

-.337 

.521 

.408 

-.449 

.485 

.894 

-.104 

.219 

.938 

-.134 

.043 

-.096 

-.140 

.307 

-.099 

-.351 

.445 

-.135 

-.241 

.215 

-.069 

-.285 

.302 

.318 

-.434 

.464 

.348 

-.212 

.144 

\  ■yrr 

.079 

-.095 

-.478 

-.113 

.186 

.088 

-.256 

.292 

pSSfpBEE 

-.277 

.318 

-.067 

.351 

-.116 

-.168 

-.139 

.193 

.138 

-.369 

.551 

.235 

-.447 

.091 

.884 

-.083 

.011 

-.832 

-.108 

.121 

-.193 

.091 

-.832 

.213 

-.104 

.192 

.884 

.213 

-.035 

-.083 

-.104 

-.035 

-.910 

.011 

.192 

-.032 

-.910 

14 

16 

17 

18 

.061 

.030 

.020 

.011 

Table  F.  5.  PCA  Results  for  the  Moderate  scenario,  Baseline  planning  condition. 


-J 

ON 


Correlations  Transformed  Variables 


TATT 

TAAT 

TCAT 

TUAT 

TAT 

WTQMS 

WTCrew 

WTI 

WTO 

PHV 

PHV  D 

SHV 

SHV  D 

MD 

UIC 

UCD 

UIT 

UU 

LZFT 

FMAC  6  EART 

FMAC  6  HER 

FMAC  6  AAT 

SMAC  2  EFR 

SMAC  2  AAT 

TETT 

■l.uiW 

.2o  2 

.704 

-.116 

.202 

.261 

.965 

7SH1 

-.465 

-Mi) 

.559 

.255 

.48? 

.jjs 

.022 

.534 

-.09b 

-.334 

.§54 

.247 

-.629 

.55C 

-.362 

TAAT 

.202 

1.000 

.408 

.403 

1.000 

.999 

.302 

-.376 

-.143 

.100 

-.152 

-.250 

-.292 

.998 

-.201 

-.118 

-.366 

-.454 

.218 

.997 

-.608 

.762 

-.161 

.156 

TCAT 

.704 

.408 

1.000 

-.085 

.408 

.409 

.746 

-42C 

-.534 

.071 

-.01C 

.029 

.143 

.424 

.268 

.165 

-.287 

-.531 

.724 

.460 

-.853 

.789 

-.23? 

.242 

TUAT 

-.lie 

.403 

-.085 

1.000 

.403 

.404 

.047 

-.081 

.161 

.067 

-.103 

-.218 

-.375 

40C 

-.255 

-.142 

-.136 

-.047 

-.152 

.373 

-.054 

.165 

.060 

-.074 

TAT 

.202j 

1.000 

.408 

.403 

1.000 

.999 

.302 

-.376 

-.143 

.10C 

-.152 

-.250 

-.292 

.998 

-.201 

-.118 

-.366 

-.454 

.218 

.997 

-.608 

.762 

-.161 

.158 

WTQMS 

.201 

.999 

.409 

.404 

.999 

1.000 

.303 

-.372 

-.137 

.103 

-.152 

-.255 

-.291 

.998 

-.196 

-.124 

-.357 

-.450 

.217 

.997 

-.607 

.761 

-.167 

.162 

WT  Crew 

.965 

.302 

.746 

.047 

.302 

.303 

1.000 

-.236 

-.393 

-.034 

.478 

.204 

.447 

.336 

.016 

.507 

-.112 

-.338 

.914 

.345 

-.693 

.645 

-.257 

.252 

wr t 

-.201 

-.376 

-42q 

-.081 

-.376 

-.372 

-.236 

1.00C 

.599 

-.184 

.224 

-.202 

.113 

-.375 

-.194 

-.15C 

.663 

.866 

-.249 

-.393 

.449 

-.449 

.184 

-.186 

WTO3 

-.465 

-.143 

-.534 

.161 

-.143 

-.137 

-.393 

.59? 

1.000 

.092 

-.239 

-.285 

-.032 

-.13? 

-.189 

-.353 

.656 

.796 

-.489 

-.165 

.477 

-.278 

.276 

-.294 

PHV' 

-.129 

.100 

.071 

.067 

.100 

.103 

-.034 

-.184 

.092 

1.00C 

-.457 

.102 

-.03? 

.112 

-.267 

.06? 

-.016 

-.119 

-.127 

.101 

-.161 

.154 

.170 

-.186 

PHV  D3 

.559 

-.152 

-.010 

-.103 

-.152 

-.152 

.478 

.224 

-.239 

-.457 

1.00C 

.156 

.463 

-.138 

-.186 

.313 

-.005 

.049 

.416 

-.135 

-.055 

-.028 

-.241 

.254 

SHVa 

.255 

-.250 

.029 

-.218 

-.250 

-.255 

.204 

-.202 

-.285 

.102 

.156 

1.000 

.608 

-.243 

-.129 

.52? 

-.045 

-.141 

.303 

-.252 

.025 

-.113 

.184 

-.181 

SHV  D 

.487 

-.292 

.143 

-.375 

-.292 

-.291 

.447 

.112 

-.032 

-.03? 

.463 

.608 

1.000 

-.263 

-.124 

.474 

.287 

.103 

.481 

-.272 

-.042 

-.003 

-.055 

.062 

MD 

.233 

.998 

.424 

.400 

.998 

.998 

.336 

-.375 

-.139 

.112 

-.138 

-.243 

-.263 

1.00C 

-.198 

-.088 

-.359 

-.456 

.246 

.996 

-.616 

.771 

-.174 

.171 

UIC 

.022 

-.201 

.268 

-.255 

-.201 

-.196 

.016 

-.194 

-.189 

-.267 

-.186 

-.129 

-.124 

-.198 

1.000 

-.051 

-.227 

-.168 

.073 

-.169 

-.131 

-.006 

,03q 

-.042 

UCD 

.5341 

-.118 

.165| 

-.142 

-.118 

-.124 

.507 

-.1 5C 

-.353 

.06? 

.313 

.529 

.474 

-.088 

-.051 

1.00C 

-.080 

-.209 

.570 

-.117 

-.132 

-.054 

-.080 

.08? 

UIT 

-.096 

-.366 

-  287 

-.136 

-.366 

-.357 

-.112 

.663 

.656 

-.016 

-.005 

-.045 

.287 

-.35? 

-.227 

-.08C 

1.000 

.779 

-.137 

-.370 

.404 

-.307 

.064 

-.072 

UU3 

-.334 

-.454 

-.531 

-.047 

-.454 

-.450 

-.338 

.866 

.796 

-.11? 

.049 

-.141 

.103 

-.456 

-.168 

-.20? 

.779 

1.000 

-.379 

-.474 

.559 

-.512 

.237 

-.241 

LZFT 

.954 

.218 

.724 

-.152 

.218 

.217 

.914 

-.24? 

-.489 

-.127 

.416 

.303 

.481 

.246 

.073 

.57C 

-.137 

-.379 

1.000 

.259 

-.644 

.546 

-.304 

.306 

FMAC  6  EART 

.247 

.997 

.460 

.373 

.997 

.997 

.345 

-.393 

-.165 

.101 

-.135 

-.252 

-.272 

.996 

-.169 

-.117 

-.370 

-.474 

.259 

1.000 

-.654 

.802 

-.172 

.166 

FMAC  6  HER 

-.629 

-.608 

-.853 

-.054 

-.608 

-.607 

-.693 

.44? 

.477 

-.161 

-.055 

.025 

-.042 

-.616 

-.131 

-.132 

.404 

.559 

-.644 

-.654 

1.000 

-.913 

.282! 

-.286 

FMAC  6  AAT 

.550 

.762 

.789 

.165 

.762 

.761 

.645 

-.445 

-.278 

.154 

-.028 

-.113 

-.003 

.771 

-.006 

-.054 

-.307 

-.512 

.546 

.802 

-.913 

1.000 

-.219 

.214 

SMAC  2  EFR 

-.302 

-.161 

-.237 

.060 

-.161 

-.167 

-.257 

.184 

.276 

.1 7C 

-.241 

.184 

-.055 

-.174 

.032 

-.08C 

.064 

.237 

-.304 

-.172 

.282 

-.219 

1.000 

-.996 

SMAC  2  AAT 

.302j 

.158 

.243 

-.074 

.158 

.163 

.253 

-.186 

-.294 

-.188 

.254 

-.181 

.062 

.171 

-.042 

.08? 

-.073 

-.241 

.306 

.168 

-.285 

.214 

-.996 

1.00C 

Dimension 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23J 

24 

Eigenvalue 

8.658 

4.786 

2.503 

2.086 

1.500 

1.257 

.794 

.625 

.525 

.358 

.285 

.178 

.153 

.136 

.072 

.050 

.017 

.010 

.005 

.003 

.001 

.001 

.000 

.00c 

Table  F.  6.  PCA  Results  for  the  Moderate  scenario,  Human-Only  planning  condition. 


Correlations  Transformed  Variables 


TATT 

TAAT 

TCAT 

TUAT 

TAT 

WTQMS 

WTCrew 

WTI 

PHV 

PHV  D 

SHV 

SHV  D 

MD 

UCD 

UIT 

UU 

LZFT 

FMAC  6  EART 

FMAC  6  HER 

FMAC  6  AAT 

SMAC  2  EART 

SMAC  2  EFR 

SMAC  2  AAT 

TETT 

IKiTiH 

.oar1 

-.0/9 

Ao‘2 

098 

.577 

-.302 

~55S 

n»:< 

.419 

.347 

-.232 

-.302 

-.33;/ 

■Mill 

.309 

-.322 

.345 

.311 

-.32: 

-.084 

lMT* 

.097 

1.000 

.092 

.069 

1.000 

-.058 

-.055 

.088 

-.551 

.154 

-.294 

-.158 

-.055 

-.074 

-.037 

.031 

-.021 

-.033 

-.998 

7CAT 

.734 

.092 

-.276 

.098 

.092 

.564 

-.266 

.022 

-.268 

.306 

.142 

.304 

-.193 

-.266 

-.343 

.701 

.307 

-.313 

.33? 

.314 

-.317 

-.09C 

TUAT 

-.079 

.069 

-.276 

.067 

.070 

-.404 

.136 

-.005 

-.097 

-.067 

-.518 

-.180 

.136 

.163 

-.021 

-.675 

.678 

-.685 

-.667 

.664 

-.064 

FAT3 

1.000 

.098 

.067 

1.000 

-.055 

-.056 

.086 

-.552 

.154 

-.297 

.679 

-.152 

-.056 

-.079 

-.031 

.030 

-.032 

-.998 

wroMS1 

.098 

1.000 

.092 

.070 

1.000 

-.057 

-.055 

.088 

-.551 

.155 

-.293 

.677 

-.158 

-.055 

-.074 

-.037 

.031 

-.021 

-.033 

-.998 

WTCrew 

.577 

-.058 

.564 

-.404 

-.055 

-.057 

-.188 

.303 

.123 

.280 

.232 

.503 

-.080 

-.188 

-.181 

.506 

.676 

-.678 

.676 

.687 

-.708 

.075 

WTI 

-.053 

-.266 

.136 

-.056 

-.055 

-.188 

-.114 

-.037 

-.163 

.261 

-.151 

-.096 

.845 

-26C 

-.098 

.101 

-.087 

-.103 

.09? 

.06C 

PHV 

.26: 

.088 

.022 

-.005 

.086 

.088 

-.114 

.349 

.349 

.242 

-.160 

-.114 

.014 

.222 

.157 

-.149 

.176 

.159 

-.166 

-.082 

PHV  D 

-.551 

-.268 

.007 

-.552 

-.551 

.123 

-.037 

.349 

.425 

-32C 

.076 

.097 

.067 

.054 

.072 

-.078 

.562 

SHV 

.419 

.154 

.306 

-.097 

.154 

.155 

28C 

-.163 

-.032 

.039 

.426 

.18C 

-.157 

-.163 

.324 

.163 

-.190 

.181 

.170 

-.166 

-.152 

SHV  D 

.347 

-.294 

.142 

-.067 

-.297 

-.293 

.232 

.261 

.349 

.425 

.426 

-.125 

-.057 

.261 

.377 

.119 

-.122 

.143 

.124 

-.132 

.311 

MD 

.677 

.304 

-.518 

.677 

-.151 

.242 

-.320 

.180 

-.125 

-.151 

.122 

-.679 

.687 

.678 

-.677 

-.667 

UCD 

-.232 

-.158 

-.193 

-.180 

-.152 

-.158 

-.080 

-.096 

-.160 

.076 

-.157 

-.057 

-.096 

-.177 

.094 

-.064 

.036 

.097 

-.09? 

.164 

UIT 

-.055 

-.266 

.136 

-.056 

-.055 

-.188 

1.000 

-.114 

-.037 

-.163 

.261 

-.151 

-.096 

1.000 

.845 

-26C 

-.098 

.101 

-.087 

-.103 

.09? 

.06C 

UU 

-.332 

-.074 

-.343 

.163 

-.074 

-.181 

.845 

.014 

.095 

-.147 

.323 

-.157 

.086 

.845 

1.000 

-33C 

-.074 

.086 

-.082 

-.078 

.07C 

.078 

LZFT 

.961 

-.037 

.701 

-.021 

-.037 

.506 

-.260 

.223 

.097 

.324 

.377 

-.177 

-26C 

.178 

-.190 

.214 

.179 

-.196 

.05C 

"MAC  6  EART3 

.30? 

-.034 

.307 

-.675 

-.034 

.676 

-.098 

.157 

.067 

.163 

.119 

.677 

.094 

-.098 

E 

-.998 

.995 

.999 

-.998 

.048 

FMAC  6  HER3 

.031 

-.313 

.678 

.031 

-.678 

.101 

-.149 

-.060 

-.190 

-.122 

-.67? 

-.064 

.101 

BE 

-.998 

-.997 

-.996 

.995 

-.042 

FMAC  6  AATa 

.345 

-.021 

.339 

-.685 

-.021 

.676 

-.087 

.176 

.054 

.181 

.143 

.687 

.036 

-.087 

-.082 

E 

.995 

-.997 

.992 

-.991 

.031 

SMAC  2EART3 

.311 

-.033 

.314 

-.667 

-.033 

.687 

-.103 

.15^ 

.072 

.170 

.124 

.678 

.097 

-.103 

-.078 

be 

.999 

-.996 

-.998 

.048 

SMAC  2  EFR3 

.033 

-.317 

.664 

.039 

-.708 

.099 

-.166 

-.078 

-.166 

-.132 

-.677 

-.099 

.099 

.070 

-.998 

.995 

-.991 

-.998 

-.052 

SMAC  2  AAf 

-.998 

-.090 

-.064 

-.998 

-.998 

.075 

.060 

-.082 

.562 

-.152 

.311 

.164 

06C 

.078 

.046 

-.042 

.031 

.046 

-.052 

Dimension 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

14 

15 

16 

m 

18 

19 

2C 

21 

22 

23 

Eigenvalue 

5.033 

3.248 

2.447 

1.377 

.843 

.544 

.390 

.323 

.188 

.158 

.011 

BE 

be 

.000 

— 3 
— 3 


Table  F.  7.  PCA  Results  for  the  Moderate  scenario,  Human-Algorithm  planning  condi¬ 
tion. 


Table  F.  8.  Cross-Correlations  for  the  Moderate  Scenario. 


Moderate  Scenario 


TATT 

TAAT 

TAT 

LZFT 

FI 8  6  HER 

F186AAT 

WTI 

UIT 

uu 

TATT 

X 

TAAT 

X 

TAT 

X 

X 

LZFT 

X 

X 

FI 8  6  HER 

X 

FI  8  6  A  AT 

X 

X 

WTIa 

X 

UIT 

X 

X 

UUa 

X 

X 

X 

In  this  table,  X’s  denote  that  these  metrics  were  highly  correlated  for  all  the  planning 
conditions  (Table  F.  5  -  Table  F.  7).  For  example,  TATT  and  LZFT  were  shown  to  be 
highly  correlated  for  the  B,  FIO,  and  HA  planning  conditions  within  the  Simple  scenario. 
TATT  and  TAAT,  however,  were  not  highly  correlated  in  at  least  one  of  these  three  plan¬ 
ning  conditions. 
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Correlations  Transformed  Variables 


cm 

E7W] 

HIB1 

KR1 

F!u1 

EM 

SHV  D 

Blil 

|Eal 

SMAC  2  EFR 

SMAC  2  AAT 

mT 

1.000 

.772 

.422 

.438 

mn 

284 

.727 

.419 

.225 

-.134 

-.107 

-.337 

-.133 

.787 

.446 

.121 

-.225 

TAAT 

.772 

1.000 

.426 

.348 

.832 

.148 

.695 

.275 

.180 

.115 

-.110 

.828 

.285 

-.025 

TCAT 

.422 

.426 

.134 

.514 

.491 

.138 

.557 

.041 

.257 

-.421 

-.420 

.434 

.471 

-.361 

.329 

TUAT 

.438 

.348 

.134 

.386 

.494 

.245 

.159 

-.051 

-.475 

-.270 

-.470 

-.248 

.379 

.264 

-.307 

TAT3 

.807 

.832 

.514 

.386 

.265 

.724 

.472 

.312 

-.012 

-.227 

.126 

.718 

.359 

-'Bui 

WTQMS3 

.284 

.148 

.491 

.494 

.265 

mm 

.062 

-.331 

-.327 

-.595 

-.942 

.257 

.248 

.809 

.222 

i/vtc3 

.727 

.695 

.138 

.245 

.724 

.192 

.553 

-.243 

.118 

-.081 

-.198 

.448 

.159 

.246 

-.246 

WTCrcw3 

.419 

.275 

.557 

.159 

.472 

IBS 

.192 

.329 

.361 

-.183 

-.113 

.248 

.289 

.242 

-.181 

m  - 

PHV3 

.225 

.180 

.041 

-.051 

.312 

-.331 

.553 

.329 

.037 

.150 

.262 

-.110 

mm 

-.248 

.210 

-.141 

PHV_D3 

-.134 

-.062 

.257 

-.475 

.053 

-.327 

-.243 

.361 

.367 

.607 

BE 

-.269 

-.521 

.562 

SHV 

-.107 

.115 

-.421 

-.27C 

-.012 

-.595 

.118 

-.183 

.150 

PPPP 

ILl 

.616 

-.248 

BE 

-.571 

.229 

-.178 

SHV_D 

-.337 

-.110 

-.420 

-,47C 

-.942 

-.081 

-.113 

.262 

.367 

.616 

-.195 

-.304 

-.813 

.197 

-.159 

MD 

-.133 

.064 

.434 

-.248 

.126 

.257 

-.198 

.248 

-.110 

.607 

-.248 

-.195 

-.010 

.264 

-.946 

.972 

LZFT 

.787 

.828 

.471 

.407 

.718 

.248 

.448 

.289 

-.122 

-.304 

-.010 

B22 

.432 

.032 

-.111 

SMAC  2  EART 

.446 

.285 

.379 

.359 

.809 

.159 

.242 

-.248 

-.269 

-.571 

-.813 

.264 

KS 

-.319 

.173 

SMAC  2  EFR 

.121 

-.025 

-.361 

.264 

-.069 

-.260 

.246 

-.181 

.210 

-.521 

.229 

.197 

-.946 

is 

-.319 

-.939 

SMAC  2  AAT 

-.225 

.003 

.329 

-.307 

.222 

-.246 

-.141 

.562 

-.178 

-.159 

.972 

-.in 

.173 

-.939 

Dimension 

1 

2 

3 

4 

6 

7 

9 

10 

11 

12 

13 

14 

15 

16 

17 

Eigenvalue 

5.577 

4.110 

2.94C 

1.168 

.929 

.591 

.465 

.275 

.197 

.145 

.110 

.077 

.038 

.020 

.014 

Hitt 

VO 


Table  F.  9.  PCA  Results  for  the  Complex  scenario,  Baseline  planning  condition. 


fransformed  Variables 


i CTB 

B8H 

Mi«»l 

ITT31 

PBE1 

pin 

im 

SMAC  2  EART 

SMAC  2  EFR 

SMAC  2  AAT 

1 

H-* 

7526 

mm 

TOST 

T230 

T250 

T335 

7S92 

7575 

-.106 

.253 

.379 

-.134 

-.335 

-.399 

.779 

-.150 

.024 

-.031 

.178 

-.030 

-.195 

.058 

-.392 

.316 

.171 

.088 

-.057 

.082 

-.040 

.076 

.341 

.392 

.331 

-.281 

-.229 

-.159 

.538 

-.259 

.125 

-.099 

-.067 

.249 

-.042 

.391 

-.149 

-.213 

-.331 

-.418 

.839 

-.086 

-.069 

.046 

.355 

-.150 

-.473 

.220 

-.349 

-.107 

.494 

.469 

.348 

-.301 

.265 

-.314 

.152 

-.028 

.167 

.096 

-.189 

-.288 

-.256 

.614 

.118 

-.057 

.129 

.349 

-.149 

.112 

-.335 

.069 

-.004 

.279 

.196 

-.236 

.182 

.443 

-.197 

-.260 

.130 

.061 

-.343 

.932 

.920 

-.198 

.231 

-.260 

.361 

-.167 

.044 

.169 

-.221 

.023 

-.217 

-.336 

-.339 

.192 

.267 

-.135 

.113 

-.121 

.249 

-.253 

-.124 

-.221 

.025 

-.030 

.000 

-.422 

.015 

-.040 

-.114 

.466 

.444 

-.314 

.177 

-.423 

1.00C 

-.078 

-.17C 

-.210 

-.013 

-.107 

-.076 

.144 

-.053 

.218 

-.145 

.015 

-.076 

.106 

.248 

.137 

-.322 

-.27C 

.081 

-.149 

.291 

-.264 

.502 

-.170 

.105 

.073 

-.227 

.047 

.052 

.418 

-.043 

-.169 

.074 

-.040 

-.210 

.248 

.073 

-.123 

.038 

.014 

-.396 

.441 

-.426 

-.114 

-.012 

.137 

-.227 

-.123| 

-.273 

-.276 

-.293 

-.325 

.101 

-.124 

.466 

-.107 

-.322 

.047 

.038 

-.273 

1.00 

.962 

-.317 

.188 

-.220 

.286 

.444 

-.076 

-.270 

.052 

.014 

-.276 

.962 

1.000 

-.337 

.249 

-.227 

.009 

.144 

.081 

.418 

-.025 

-.293 

-.317 

-.337 

-.123 

-.176 

.126 

.100 

-.053 

-.149 

-.043 

-.396 

-.325 

.188 

.249 

-.123 

-.152 

.259 

-.314 

.210 

.291 

-.169 

.441 

.101 

-.220 

-.227 

-.176 

-.152 

-.905 

.177 

-.145 

-.264 

.074 

-.426 

-.124 

.286 

.304 

.126 

.259 

-.905 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

.366 

.333 

.274 

.207 

.197 

.102 

.074 

.031 

.018 

Table  F.  10.  PCA  Results  for  the  Complex  scenario,  Human-Only  planning  condition. 


Table  F.  12.  Cross-correlations  for  the  Complex  scenario. 


Complex  Scenario 


TATT 

TAAT 

TAT 

LZFT 

WTI 

UIT 

uu 

TATT1’ 

X 

TAAT 

X 

TATa 

X 

X 

LZFT 

X 

X 

WTI 

X 

UITa 

X 

X 

UUa 

X 

X 

X 

In  this  table,  X’s  denote  that  these  metrics  were  highly  correlated  for  all  the  planning 
conditions  (Table  F.  9  -  Table  F.  11).  For  example,  TATT  and  LZFT  were  shown  to  be 
highly  correlated  for  the  B,  FIO,  and  HA  planning  conditions  within  the  Simple  scenario. 
TATT  and  TAAT,  however,  were  not  highly  correlated  in  at  least  one  of  these  three 
planning  conditions. 
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Appendix  G  -  Normality  Tests 


Table  G.  1.  Results  of  Kolmogorov-Smirnov  normality  tests  for  the  Simple  scenario 

terisks  denote  that  data  was  non-nonnal). 


Metric 

Treatment 

Statistic 

df 

Sig. 

LZFT 

B 

0.115 

30 

0.200 

HO 

0.152 

30 

0.073 

HA* 

0.385 

28 

0.000 

TATT 

B 

* 

O 

X 

HA* 

TAAT 

B 

0.137 

30 

0.159 

HO 

0.104 

30 

0.200 

HA* 

0.176 

28 

0.026 

TCAT 

B 

0.151 

30 

0.080 

HO 

0.145 

30 

0.107 

HA 

0.092 

28 

0.200 

MD 

B 

0.121 

30 

0.200 

* 

O 

X 

0.193 

30 

0.006 

HA 

0.153 

28 

0.091 

C2LR 

B 

* 

O 

X 

HA* 

Metric 

Treatment 

Statistic 

df 

Sig. 

C3LR 

B 

■ m< 

0.200 

HO* 

30 

0.013 

HA* 

28 

0.000 

C4LR 

B 

0.127 

30 

0.200 

HO* 

0.182 

30 

0.013 

HA* 

0.358 

28 

0.000 

TCLR 

B 

30 

0.200 

HO* 

0.013 

HA* 

28 

0.000 

HV 

B 

0.123 

30 

0.200 

HO 

0.099 

30 

0.200 

HA* 

0.180 

28 

0.020 

HV-D 

B* 

30 

0.000 

HO 

0.200 

HA* 

28 

0.000 

UIT 

B 

- 

- 

- 

HO* 

0.167 

30 

0.032 

HA* 

0.196 

28 

0.007 

Table  G.  2.  Results  of  Kolmogorov-Smimov  normality  tests  for  the  Moderate  scenario 
(asterisks  denote  that  data  was  non-nonnal). 


Metric 

Treatment 

Statistic 

df 

Sig. 

TAAT 

B 

0.085 

30 

0.200 

* 

O 

a 

0.275 

29 

0.000 

HA 

0.089 

28 

0.200 

TCAT 

B 

0.090 

30 

0.200 

HO 

0.089 

29 

0.200 

HA 

0.103 

28 

0.200 

WTQMS 

B* 

0.167 

30 

0.032 

* 

O 

X 

0.367 

29 

0.000 

HA 

0.088 

28 

0.200 

MD 

B 

0.063 

30 

0.200 

* 

O 

X 

0.183 

29 

0.014 

HA 

0.105 

28 

0.200 

HV 

B* 

0.195 

30 

0.005 

HO 

0.106 

29 

0.200 

HA 

0.096 

28 

0.200 

HV-D 

B* 

0.163 

30 

0.040 

HO 

0.144 

29 

0.131 

HA 

0.127 

28 

0.200 

UIT 

B 

- 

- 

- 

* 

O 

a 

0.222 

29 

0.001 

HA 

0.153 

28 

0.090 

Metric 

Treatment 

Statistic 

df 

Sig. 

LZFT 

B 

0.113 

30 

0.200 

HO 

0.156 

29 

0.070 

HA 

0.137 

28 

0.192 

FMAC6HFR 

B* 

0.457 

30 

0.000 

* 

O 

X 

0.369 

29 

0.000 

HA* 

0.536 

28 

0.000 

F  M  AC_6_E  ART 

B* 

0.342 

30 

0.000 

* 

O 

X 

0.304 

29 

0.000 

HA* 

0.223 

28 

0.001 

SMAC_2_EART 

B* 

0.256 

30 

0.000 

HO 

- 

- 

- 

HA 

0.134 

28 

0.200 

SMAC_2_EFR 

B 

0.137 

30 

0.155 

HO 

0.130 

29 

0.200 

HA 

0.120 

28 

0.200 

SMAC_2_AAT 

B 

0.102 

30 

0.200 

HO 

0.121 

29 

0.200 

HA 

0.092 

28 

0.200 

TATT 

B 

0.098 

30 

0.200 

HO 

0.133 

29 

0.200 

HA 

0.107 

28 

0.2. 
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Table  G.  3.  Results  of  Kolmogorov-Smirnov  normality  tests  for  the  Complex  scenario 
(asterisks  denote  that  data  was  non-nonnal). 


Metric 

Treatment 

Statistic 

df 

Sig. 

MD 

B 

0.134 

27 

0.200 

HO 

0.111 

25 

0.200 

HA* 

0.175 

27 

0.032 

C2LR 

B 

0.134 

27 

0.200 

HO 

0.111 

25 

0.200 

HA 

- 

- 

- 

C3LR 

B 

- 

- 

- 

HO 

- 

- 

- 

HA* 

0.172 

27 

0.040 

C4LR 

B 

- 

- 

- 

HO 

- 

- 

- 

HA* 

0.172 

27 

0.040 

TCLR 

B 

0.134 

27 

0.200 

HO 

0.111 

25 

0.200 

HA* 

0.172 

27 

0.040 

HV 

B 

0.141 

27 

0.177 

* 

O 

X 

0.255 

25 

0.000 

HA 

0.145 

27 

0.153 

HV-D 

B* 

0.313 

27 

0.000 

* 

O 

X 

0.401 

25 

0.000 

HA* 

0.210 

27 

0.004 

UIT 

B 

- 

- 

- 

* 

O 

X 

0.286 

25 

0.000 

HA* 

0.197 

27 

0.009 

Metric 

Treatment 

Statistic 

df 

Sig. 

LZFT 

B 

0.130 

27 

0.200 

HO 

0.084 

25 

0.200 

HA 

0.147 

27 

0.142 

SMAC_2_AAT 

B 

0.164 

27 

0.059 

HO 

0.154 

25 

0.127 

HA 

0.161 

27 

0.070 

SMAC_2_EFR 

B 

0.145 

27 

0.152 

HO 

0.134 

25 

0.200 

HA 

0.158 

27 

0.084 

SMAC_2_EART 

B* 

0.303 

27 

0.000 

X 

o 

* 

0.347 

25 

0.000 

HA* 

0.170 

27 

0.044 

TATT 

B 

0.099 

27 

0.200 

HO 

0.120 

25 

0.200 

HA 

0.119 

27 

0.200 

TAAT 

B 

0.120 

27 

0.200 

HO 

0.079 

25 

0.200 

HA* 

0.343 

27 

0.000 

TCAT 

B 

0.110 

27 

0.200 

HO 

0.161 

25 

0.095 

HA 

0.103 

27 

0.200 

WTQMS 

B* 

0.187 

27 

0.016 

HO 

0.156 

25 

0.117 

HA* 

0.324 

27 

0.000 
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Appendix  H  -  Tests  for  Heteroskedasticity 


Table  H.  1.  Results  of  Heteroskedasticity  tests  for  pairwise  comparisons  in  the  Simple 

scenario. 


Levene  Statistic 

dfl 

df2 

Sig. 

B  vs.  HO 

LZFT 

Based  on  Mean 

10.410 

1 

58 

.002 

TATT 

Based  on  Mean 

15.481 

1 

58 

.000 

TAAT 

Based  on  Mean 

15.975 

1 

58 

.000 

TCAT 

Based  on  Mean 

27.719 

1 

58 

.000 

MD 

Based  on  Mean 

6.501 

1 

58 

.013 

C2LR 

Based  on  Mean 

10.124 

1 

58 

.002 

C3LR 

Based  on  Mean 

17.286 

1 

58 

.000 

C4LR 

Based  on  Mean 

9.361 

1 

58 

.003 

TCLR 

Based  on  Mean 

4.021 

1 

58 

.050 

HV 

Based  on  Mean 

12.746 

1 

58 

.001 

HV-D 

Based  on  Mean 

.777 

1 

58 

.382 

B  vs.  HA 

LZFT 

Based  on  Mean 

2.426 

1 

56 

.125 

TATT 

Based  on  Mean 

2.660 

1 

56 

.109 

TAAT 

Based  on  Mean 

11.738 

1 

56 

.001 

TCAT 

Based  on  Mean 

52.938 

1 

56 

.000 

MD 

Based  on  Mean 

10.275 

1 

56 

.002 

C2LR 

Based  on  Mean 

15.030 

1 

58 

.000 

C3LR 

Based  on  Mean 

6.470 

1 

58 

.014 

C4LR 

Based  on  Mean 

13.328 

1 

58 

.001 

TCLR 

Based  on  Mean 

7.140 

1 

58 

.010 

HV 

Based  on  Mean 

5.884 

1 

56 

.019 

HV-D 

Based  on  Mean 

65.170 

1 

56 

.000 

HO  vs. 

HA 

LZFT 

Based  on  Mean 

.211 

1 

56 

.647 

TATT 

Based  on  Mean 

.238 

1 

56 

.627 

TAAT 

Based  on  Mean 

3.786 

1 

56 

.057 

TCAT 

Based  on  Mean 

4.948 

1 

56 

.030 

MD 

Based  on  Mean 

3.091 

1 

56 

.084 

C2LR 

Based  on  Mean 

13.100 

1 

58 

.001 

C3LR 

Based  on  Mean 

3.064 

1 

58 

.085 

C4LR 

Based  on  Mean 

10.508 

1 

58 

.002 

TCLR 

Based  on  Mean 

3.064 

1 

58 

.085 

HV 

Based  on  Mean 

.270 

1 

56 

.605 

HV-D 

Based  on  Mean 

81.640 

1 

56 

.000 

UIT 

Based  on  Mean 

6.148 

1 

56 

.016 
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Table  H.  2.  Results  of  Heteroskedasticity  tests  for  pairwise  comparisons  in  the  Moderate 

scenario. 


Levene  Statistic 

dfl 

df2 

Sig. 

B  vs.  HO 

LZFT 

Based  on  Mean 

.139 

1 

57 

.711 

FMAC6HFR 

Based  on  Mean 

6.228 

1 

57 

.015 

F  M  AC_6_E  ART 

Based  on  Mean 

21.729 

1 

57 

.000 

SMAC_2_EART 

Based  on  Mean 

- 

- 

- 

- 

SMAC_2_EFR 

Based  on  Mean 

.029 

1 

57 

.864 

SMAC_2_AAT 

Based  on  Mean 

.007 

1 

57 

.933 

TATT 

Based  on  Mean 

2.198 

1 

57 

.144 

TAAT 

Based  on  Mean 

13.907 

1 

57 

.000 

TCAT 

Based  on  Mean 

5.446 

1 

57 

.023 

WTQMS 

Based  on  Mean 

21.585 

1 

57 

.000 

MD 

Based  on  Mean 

17.336 

1 

57 

.000 

HV 

Based  on  Mean 

.861 

1 

57 

.357 

HV-D 

Based  on  Mean 

27.354 

1 

57 

.000 

B  vs.  HA 

LZFT 

Based  on  Mean 

1.193 

1 

56 

.279 

FMAC6HFR 

Based  on  Mean 

20.506 

1 

56 

.000 

F  M  AC_6_E  ART 

Based  on  Mean 

40.396 

1 

56 

.000 

SMAC_2_EART 

Based  on  Mean 

63.484 

1 

56 

.000 

SMAC_2_EFR 

Based  on  Mean 

4.496 

1 

56 

.038 

SMAC_2_AAT 

Based  on  Mean 

1.569 

1 

56 

.216 

TATT 

Based  on  Mean 

.232 

1 

56 

.632 

TAAT 

Based  on  Mean 

6.177 

1 

56 

.016 

TCAT 

Based  on  Mean 

1.346 

1 

56 

.251 

WTQMS 

Based  on  Mean 

37.234 

1 

56 

.000 

MD 

Based  on  Mean 

16.566 

1 

56 

.000 

HV 

Based  on  Mean 

15.082 

1 

56 

.000 

HV-D 

Based  on  Mean 

34.533 

1 

56 

.000 

HO  vs.  HA 

LZFT 

Based  on  Mean 

1.967 

1 

55 

.166 

FMAC6HFR 

Based  on  Mean 

47.875 

1 

55 

.000 

F  M  AC_6_E  ART 

Based  on  Mean 

10.785 

1 

55 

.002 

SMAC_2_EART 

Based  on  Mean 

- 

- 

- 

- 

SMAC_2_EFR 

Based  on  Mean 

4.909 

1 

55 

.031 

SMAC_2_AAT 

Based  on  Mean 

1.350 

1 

55 

.250 

TATT 

Based  on  Mean 

3.121 

1 

55 

.083 

TAAT 

Based  on  Mean 

5.395 

1 

55 

.024 

TCAT 

Based  on  Mean 

.913 

1 

55 

.343 

WTQMS 

Based  on  Mean 

9.653 

1 

55 

.003 

MD 

Based  on  Mean 

.921 

1 

55 

.341 

HV 

Based  on  Mean 

18.416 

1 

55 

.000 

HV-D 

Based  on  Mean 

.973 

1 

55 

.328 

UIT 

Based  on  Mean 

27.408 

1 

55 

.000 
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Table  H.  3.  Results  of  Heteroskedasticity  tests  for  pairwise  comparisons  in  the  Complex 

scenario. 


Levene  Statistic 

dfl 

df2 

Sig. 

B  vs.  HO 

LZFT 

Based  on  Mean 

2.292 

1 

50 

.136 

SMAC  2  AAT 

Based  on  Mean 

.287 

1 

50 

.595 

SMAC  2  EFR 

Based  on  Mean 

.607 

1 

50 

.439 

SMAC  2  EART 

Based  on  Mean 

3.382 

1 

50 

.072 

TATT 

Based  on  Mean 

2.096 

1 

50 

.154 

TAAT 

Based  on  Mean 

1.238 

1 

50 

.271 

TCAT 

Based  on  Mean 

.032 

1 

50 

.860 

WTQMS 

Based  on  Mean 

.436 

1 

50 

.512 

MD 

Based  on  Mean 

1.722 

1 

50 

.195 

C2LR 

Based  on  Mean 

1.681 

1 

50 

.201 

TCLR 

Based  on  Mean 

1.681 

1 

50 

.201 

HV 

Based  on  Mean 

.832 

1 

50 

.366 

HV-D 

Based  on  Mean 

30.286 

1 

50 

.000 

B  vs.  HA 

LZFT 

Based  on  Mean 

12.088 

1 

52 

.001 

SMAC_2_AAT 

Based  on  Mean 

.982 

1 

52 

.326 

SMAC_2_EFR 

Based  on  Mean 

1.029 

1 

52 

.315 

SMAC_2_EART 

Based  on  Mean 

15.733 

1 

52 

.000 

TATT 

Based  on  Mean 

.002 

1 

52 

.961 

TAAT 

Based  on  Mean 

23.367 

1 

52 

.000 

TCAT 

Based  on  Mean 

.369 

1 

52 

.546 

WTQMS 

Based  on  Mean 

31.547 

1 

52 

.000 

MD 

Based  on  Mean 

12.346 

1 

52 

.001 

C2LR 

Based  on  Mean 

- 

- 

- 

- 

C3LR 

Based  on  Mean 

- 

- 

- 

- 

C4LR 

Based  on  Mean 

- 

- 

- 

- 

TCLR 

Based  on  Mean 

8.217 

1 

52 

.006 

HV 

Based  on  Mean 

27.620 

1 

52 

.000 

HV-D 

Based  on  Mean 

101.324 

1 

52 

.000 

HO  vs.  HA 

LZFT 

Based  on  Mean 

2.789 

1 

50 

.101 

SMAC_2_AAT 

Based  on  Mean 

1.581 

1 

50 

.214 

SMAC_2_EFR 

Based  on  Mean 

1.870 

1 

50 

.178 

SMAC_2_EART 

Based  on  Mean 

15.679 

1 

50 

.000 

TATT 

Based  on  Mean 

1.726 

1 

50 

.195 

TAAT 

Based  on  Mean 

20.810 

1 

50 

.000 

TCAT 

Based  on  Mean 

.711 

1 

50 

.403 

WTQMS 

Based  on  Mean 

29.464 

1 

50 

.000 

MD 

Based  on  Mean 

14.526 

1 

50 

.000 

C2LR 

Based  on  Mean 

- 

- 

- 

- 

C3LR 

Based  on  Mean 

- 

- 

- 

- 

C4LR 

Based  on  Mean 

- 

- 

- 

- 

TCLR 

Based  on  Mean 

10.908 

1 

50 

.002 

HV 

Based  on  Mean 

11.456 

1 

50 

.001 

HV-D 

Based  on  Mean 

286.921 

1 

50 

.000 

UIT 

Based  on  Mean 

39.199 

1 

50 

.000 
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Appendix  I  -  Boxplots  for  the  Simple  Scenario 
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Figure  I.  1.  Landing  Zone  Foul  Time  (LZFT). 
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Figure  I.  3.  Total  Aircraft  Active  Time  (TAAT). 
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Figure  I.  4.  Total  Crew  Active  Time  (TCAT). 
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Figure  I.  5.  Mission  Duration  (MD). 
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Figure  I.  6.  Catapult  2  Launch  Rate  (C2LR). 
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Figure  I.  7.  Catapult  4  Launch  Rate  (C4LR). 
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Figure  I.  8.  Total  Catapult  Launch  Rate  (TCLR). 
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Figure  I.  9.  Halo  Violations  (HV). 
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Figure  I.  10.  Halo  Violation  Durations  (HV-D).  Spikes  in  plot  imply  95%  confidence  in¬ 
terval  notches  extend  past  included  data. 
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Figure  I.  1 1.  User  Interaction  Count  (UIC).  Spikes  in  plot  imply  95%  confidence  interval 

notches  extend  past  included  data. 
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Figure  I.  12.  User  Interaction  Time  (UIT). 
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Appendix  J  -  Boxplots  for  the  Moderate  Scenario 
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Figure  J.  1.  Landing  Zone  Foul  Time  (LZFT).  Spikes  in  plot  imply  95%  confidence  in¬ 
terval  notches  extend  past  included  data. 
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Figure  J.  2.  FMAC  #6  Hydraulic  Fluid  Remaining  (FMAC  6  HFR).  Spikes  in  plot  imply 
95%  confidence  interval  notches  extend  past  included  data. 
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Figure  J.  3.  FMAC  #6  Emergency  Aircraft  Recovery  Time  (FMAC  6  EART). 
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Figure  J.  4.  SMAC  #2  Aircraft  Active  Time  (SMAC  2  AAT). 
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Figure  J.  5.  SMAC  #2  Emergency  Fuel  Remaining  (SMAC  2  EFR). 


26 

24 

22 

(/) 

.1  20 

2 

18 

16 

14 


Figure  J.  6.  SMAC  #2  Emergency  Aircraft  Recovery  Time  (SMAC  2  EART). 
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Figure  J.  7.  Total  Aircraft  Taxi  Time  (TATT). 
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Figure  J.  8.  Total  Aircraft  Active  Time  (TAAT). 
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Figure  J.  9.  Total  Crew  Active  Time  (TCAT). 
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Figure  J.  10.  Wait  Time  in  Queue  in  Marshal  Stack  (WTQMS). 
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Figure  J.  11.  Mission  Duration  (MD). 
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Figure  J.  12.  Halo  Violations  (HV). 
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Figure  J.  13.  Halo  Violation  Durations  (HV-D). 


uic 


12  - 


10  - 


8  - 


6  - 


I  4 


2  - 


0  - 


HO 

Planning  Condition 


HA 


Figure  J.  14.  User  Interaction  Count  (UIC). 
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Figure  J.  15.  User  Interaction  Time  (UIT). 
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Appendix  K  -  Boxplots  for  the  Complex  Scenario 
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Figure  K.  1.  Fuel  Violation  (FV) 
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Figure  K.  2.  Landing  Zone  Foul  Time  (LZFT). 
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Figure  K.  3.  SMAC  #2  Aircraft  Active  Time  (SMAC  2  AAT). 
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Figure  K.  4.  SMAC  #2  Emergency  Fuel  Remaining  (SMAC  2  EFR). 
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Figure  K.  5.  SMAC  #2  EART  (SMAC  2  EART). 
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Figure  K.  6.  Total  Aircraft  Taxi  Time  (TATT). 
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Figure  K.  7.  Total  Aircraft  Active  Time  (TAAT). 
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Figure  K.  8.  Total  Crew  Active  Time  (TCAT). 
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Figure  K.  9.  Wait  Time  in  Queue  in  Marshal  Stack  (WTQMS). 
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Figure  K.  10.  Mission  Duration  (MD). 
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Figure  K.  1 1 .  Catapult  2  Launch  Rate  (C2LR). 
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Figure  K.  12.  Catapult  3  Launch  Rate  (C3LR). 
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Figure  K.  13.  Catapult  4  Launch  Rate  (C4LR). 


0.044  - 


0.043  - 


0.042 


0.039  - 


0.038 


TCLR 


- 

- 

r-  "t 

1 

C  _ L_  13 

t 

“ 

- 

| 

1 

i 

+ 

HO 

Planning  Condition 


HA 


Figure  K.  14.  Total  Catapult  Launch  Rate  (TCLR). 


211 


HV 


220 

- 

+ 

- 

200 

1 

+ 

- 

(/) 

o  180 

i 

1 

1 

1 

- 

> 

2  160 
a> 

_Q 

E 

d 

- 

1 

1 

1 

- 

z 

/ \  r 

140 

>-< 

1 

1 

> — < 

120 

1 

i 

i 

B 

HO  HA 

Planning  Condition 

Figure  K.  15.  Halo  Violations  (HV). 
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Figure  K.  16.  Halo  Violation  Durations  (HV-D).  Spikes  in  plot  imply  95%  confidence 
interval  notches  extend  past  included  data. 
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Figure  K.  17.  User  Interaction  Count  (UIC).  Spikes  in  plot  imply  95%  confidence  interval 

notches  extend  past  included  data. 
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Figure  K.  18.  User  Interaction  Time  (UIT). 
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